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Abstract 

In this paper, wc consider the problem of estimating the covariance kernel and its eigenvalues 
OO , and eigenfunctions from sparse, irregularly observed, noise corrupted and (possibly) correlated 

functional data. We present a method based on pre-smoothing of individual sample curves 
through an appropriate kernel. We show that the naive empirical covariance of the pre-smoothed 
sample curves gives highly biased estimator of the covariance kernel along its diagonal. We 
attend to this problem by estimating the diagonal and off-diagonal parts of the covariance 
kernel separately. We then present a practical and efficient method for choosing the bandwidth 
f~~- , for the kernel by using an approximation to the leave-one-curve-out cross validation score. We 

prove that under standard regularity conditions on the covariance kernel and assuming i.i.d. 
[-J*] ', samples, the risk of our estimator, under L^ loss, achieves the optimal nonparametric rate when 

the number of measurements per curve is bounded. We also show that even when the sample 
curves are correlated in such a way that the noiseless data has a separable covariance structure, 
the proposed method is still consistent and we quantify the role of this correlation in the risk of 
the estimator. 
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1 Introduction 



Noisy functional data arise frequently in various fields, for example longitudinal data analysis, 
chemometrics, econometrics, etc (Ferraty and Vieu, 2006). Depending on how the measurements 
are taken, there can be two different scenarios - (i) individual curves are measured on a dense, 
^ ' regular grid; (ii) the measurements are observed on a sparse, and typically irregular set of points 

in an interval. The first situation usually arises when the data are recorded by some automated 
instrument, e.g. in chemometrics, where the curves represent the spectra of certain chemical sub- 
stances. The second scenario is more typical in longitudinal studies where the individual curves 
could represent the level of concentration of some substance, and the measurements on the subjects 
may be taken only at irregular time points. 

In these settings, when the goal of analysis is either data compression, model building or study- 
ing covariate effects, one may want to extract information about the functional principal components 
(i.e., the eigenvalues and eigenfunctions of the covariance kernel). The eigenfunctions give a nice 
basis for representing the data, and hence are very useful in problems related to model building and 
prediction for functional data. For example, they have been used extensively in functional linear 
regression (Cardot, Ferraty and Sarda (1999), Hall and Horowitz (2007), Cai and Hall (2006)). 
Ramsay and Silverman (2005) and Ferraty and Vieu (2006) give extensive surveys of the appli- 
cations of functional principal components. In the first scenario, i.e., data on a regular grid, as 
long as the individual curves are smooth, the measurement noise level is low, and the grid is dense 
enough, one can essentially treat the data to be on a continuum, and employ techniques similar to 
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the ones used in classical multivariate analysis. However, the irregular nature of data in the second 
scenario, and the associated measurement noise require a different treatment. In this paper, we 
propose a kernel smoothing approach to estimate the covariance surface and its functional principal 
components based on sparse, irregularly observed, noise corrupted functional data. This method 
is based on the pre-smoothing of individual curves, with suitable modification of the diagonal, for 
estimating the covariance kernel. We prove the consistency and derive the rate of convergence of the 
proposed estimator. Also, under many practical circumstances, the sample curves are correlated, 
for example, spatio-temporal data (Hlubinka and Prchal, 2007), online auction data (Peng and 
Miiller, 2008), time course gene expression data (Spellman et al, 1998). However, in the existing 
literature, most of the theoretical study on principal components analysis assume i.i.d. sample 
curves. The analysis presented in this paper shows that the asymptotic consistency of the principal 
components holds for the proposed method even under certain types of correlation structures (as 
discussed later). 

Before we go into the details of the proposed procedure, we first give an outline of the data model 
and an overview of different approaches to this problem. Suppose that we observe n realizations 
of an L^-stochastic process {X{t) : t G [0,1]} at a sequence of points on the interval [0,1] (or, 
more generally, on an interval [a, 6]), with additive measurement noise. That is, the observed data 
{Yij : 1 < i < TTii', I < i < n} can be modeled as : 

Yij=Xi{Tij) + aeij, (1) 

where {sij} are i.i.d. with mean and variance 1. Since X{t) is an L^ stochastic process, by Mercer's 
Theorem (Ash, 1972) there exists a positive semi-definite kernel C(-, •) such that Cov{X{s),X{t)) = 
C{s,t) and each Xi{t) has the following a.s. representation in terms of the eigenfunctions of the 
kernel C(-, •) : 

oo 

Xi{t) = fi{t) + Y, VKMt)iiu, (2) 

v=l 

where //(•) = E(X(-)) is the mean function; Ai > A2 > • . . > are the eigenvalues of C(-, •); ?/'v(') 
are the corresponding orthonormal eigenfunctions; and the random variables {^iy : v >!}, for each 
i, are uncorrelated with zero mean and unit variance. Furthermore, we assume that for each pair 
(i, j) with 1 < i 7^ J < n, the correlation is modelled by 

for 1 < 1^,1^' < M, and pij may be nonzero. This gives rise to a separable covariance structure for 
the noiseless data. That is, the processes {Xi{-)}f^i satisfy, Cov{Xi{s), Xj{t)) = pijC{s,t), with 
Pa = 1. This holds, for example when the principal component scores {S,iu}l''=i foi' different v are 
i.i.d. stationary time series. Finally, in the observed data model (1), we assume that Tj = {Tij : 
j = 1, . . . , rrii} are randomly sampled from a continuous distribution. 

As an example that is particularly suitable for modeling within the framework presented above, 
we consider the data on atmospheric radiation in Hlubinka and Prchal (2007). There, the mea- 
surements are taken from balloons from Earth's surface up to an altitude of 35 km. The data 
points corresponding to the i-th balloon are of the form (aj,Zj), where a represents the altitude 
and z represents the average number of pulses at altitude a, which is thought to be proportional 
to the radiation intensity. Thus, these vertical profiles of atmospheric radiation are considered as 
individual realizations of a functional data. That is, here Oj's are measurement points, Zj's are the 
measurements and the subjects are indexed by time. Hence there is a natural dependence among 



the sample curves observed over different time points. Moreover, it is reasonable to assume that 
the dependence across time does not change with the vertical distance except possibly through a 
long-term trend, i.e., the spatio-temporal covariance structure is separable. 

Below we give a short overview of two existing approaches to the problem of estimation of 
functional principal components from sparse data. Yao, Miiller and Wang (2005) propose a local 
linear smoothing of the empirical covariances {Ci{Tij, Tiji) : j 7^ i'}f=i: 

Ci{Tij,Tij,) = {Yij - Ji{Tij)){Yijf - Ji{Tij,)) 

where /I is the estimate of the mean function /i(-) obtained by local linear smoothing. They 
prove asymptotic consistency of this estimator and the estimated eigenfunctions, by assuming i.i.d. 
sample curves. Hall, Miiller and Wang (2006) prove further that the problem of estimating the 
covariance kernel and that of estimating its eigenfunctions are intrinsically different in that the 
former is a two-dimensional smoothing problem while the latter is an one-dimensional problem, 
which results in different choices for optimal bandwidth. They also prove that the proposed local 
polynomial estimator achieves the optimal nonparametric convergence rate with the optimal choice 
of bandwidths, under the i.i.d. setting, when the number of measurements per curve is bounded. 

Instead of the local polynomial approach, where one imposes regularization on the estimates by 
varying the bandwidth of the kernel, one can impose regularization by restricting the eigenfunctions 
in a known basis of smooth functions. This approach has been used by various researchers including 
Besse, Cardot and Ferraty (1997), Cardot (2000), James, Hastie and Sugar (2000) and Peng and 
Paul (2007). Peng and Paul (2007) propose to directly maximize the restricted log-likelihood under 
the working assumption of Gaussianity, such that the resulting estimator satisfies the geometry 
of the parameter space. This method is implemented through a Newton-Raphson algorithm on 
the Stiefel manifold of rectangular matrices with orthonormal columns. The latter space is the 
parameter space for the matrix of basis coefficients for the eigenfunctions. Furthermore, in Paul 
and Peng (2007) the authors prove that this restricted maximum likelihood (REML) estimator also 
achieves the optimal nonparametric rate when the number of measurements per sample curve is 
bounded and the sample curves are i.i.d. 

We now give a brief description of the estimation procedure proposed in this paper. The method 
is partly motivated by the observation that the naive sample covariance based on the presmoothed 
individual sample curves is a highly bias estimation along the diagonal of the covariance kernel, 
when 7TT,j, the number of measurements per curve, is small. As can be seen clearly from (6) in Section 
2.1, this bias does not vanish asymptotically unless (mini<j<„mj)/in — > 00 as n — > 00, where hn 
is the bandwidth of the kernel smoother. Under the latter setting, Hall et al. (2006) discuss the 
possibility of using a local linear smoother for individual sample curves and then performing a 
PCA on the smoothed curves. Furthermore, when the design points Tij are regularly spaced and 
sufficiently dense, they show that using conventional PCA for functional data (see statements and 
conditions in Theorem 3 of that paper for details) one obtains root-n consistent estimates of the 
eigenvalues and eigenfunctions so that the problem is asymptotically equivalent to a parametric 
problem. It is an interesting question that whether the naive kernel smoothing approach can be 
suitably modified such that it can produce estimators with good asymptotic risk properties even 
when the ttt-j's are relatively small. Our approach in this paper goes towards this direction and 
involves estimating the diagonal and the off-diagonal portions separately, and then merging them 
together using a smooth weight kernel. The estimation of the off-diagonal portion is based on 
presmoothing individual sample curves by a linearized kernel smoother. The estimation of the 
diagonal part involves linearized kernel smoothing of the empirical variances. The task of selecting 



an appropriate bandwidth, and the number of nonzero eigenvalues, is addressed through obtaining 
a computationally efficient approximation to the leave-one-curve-out cross validation score. This 
approximation procedure, as well as the asymptotical analysis of the estimators, is based on the 
perturbation theory of linear operators. 

Now we summarize the main contributions of this paper. Our approach of merging two separate 
presmoothed linearized kernel estimates of the diagonal and the off-diagonal parts of the covariance 
kernel is new and is computationally very efficient. We prove that the proposed estimator achieves 
the optimal nonparametric rate when the observations are i.i.d. realizations of a finite dimensional 
smooth stochastic process, and when the number of measurements per curve is bounded. This result 
parallels to the one obtained by Hall et al. (2006) for the local polynomial approach. Moreover, we 
obtain explicit expressions for the integrated mean squared error of the estimated eigenfunctions 
under a regime of separable covariance structure among the sample curves. The quantification of 
the role of correlation in the risk behavior (Theorem 4.2) is seemingly new in the literature, under 
the context of functional data analysis. We also derive a lower bound on the rate of convergence 
of the risk of the first eigenfunction (Theorem 4.3) which is sharper than an analogous (but more 
general) bound obtained in Hall et al. (2006). This lower bound and the matching upper bound on 
the rate of convergence for the i.i.d. case shows that the proposed estimator obtains the optimal 
rate even when maxi<j<.„ rrii — > oo, at least under the restricted setting described in Theorem 4.3. 
Moreover, if the correlation between sample curves is "weak" in a suitable sense, then the optimal 
rate of convergence for eigenfunctions in the correlated and i.i.d. cases are the same. Furthermore, 
we show that our estimation procedure also allows for a computationally efficient approximation of 
leave-one-curve-out cross validation score, which is used for selecting the bandwidth for estimating 
the eigenfunctions. This approximation is based on a perturbation analysis approach that is natural 
given the form of our estimator. In the paper, we also show that the widely used prediction error 
loss for cross validation is not correctly scaled under the current context. Thus we propose to use 
the empirical Kullback-Leibler loss for the cross validation criterion. 

The rest of the paper is organized as follows. In Section 2, we propose the estimation procedure 
and contrast it with the naive kernel smoothing approach. In Section 3, we propose an approxima- 
tion to the leave-one-curve-out cross validation score based on the perturbation theory for linear 
operators. In Section 4, we state the main results about the consistency and rate of convergence of 
the estimators of the covariance kernel and its eigenfunctions. In Section 5, we give an outline of 
the proof of the main results (Theorems 4.1 and 4.2) and discuss their implications. In Section 6, 
we give an overview of various related issues and future research directions. The proof details are 
provided in the appendices. 

2 Method 

Throughout this section, we assume that the mean curve has been estimated separately, and has 
been subtracted from the data. Thus, without loss of generality we assume that /x = 0. Also, in 
the asymptotic analysis carried out in Section 4, we make the same assumption to simplify the 
exposition. The case of arbitrary /x with sufficient degree of smoothness can be easily handled. 

2.1 Naive kernel smoothing approach 

A popular method in nonparametric function estimation is to smooth the individual sample curves 
by a kernel averaging of the sample points. In principle, one can adopt a similar approach in the 



current context. This means that first smoothing individual sample curves, and then computing 
the covariance of the "pre-smoothed" sample curves, followed by an eigen- analysis of this "pre- 
smoothed" empirical covariance. In the following, we first describe briefly such an approach, and 
then show that even in the case of i.i.d. data, the estimator thus obtained has an intrinsic bias 
while estimating the diagonal of the covariance kernel, unless the number of measurements per 
curve is large. 

Let K{-) be a summability kernel with an adequate degree of smoothness, and satisfying the 
following conditions: 

Bl (i) supp(K) = [—Bk,Bx] for some B^ > 0; (ii) K is symmetric about 0; (iii) f K{x)dx = 1; 
(iv) JxK{x)dx = 0; (v) f K'{x)dx = 0; (vi) JxK'{x)dx = 1. 

We then define the presmoothed sample curves as follows: 

_ 1 "^^ 

Mt) = —^YijKh„^^{t-Tij), i = l,...,n, (3) 

""' j=i 

where Kh{x) = h~^K{h^^x) for ^ > and hn,i is the bandwidth for the i-th curve. Then the 
empirical covariance based on the presmoothed curves is simply 

1 '^ „ ^ 
C{s,t) = -y2x,{t)X,is). (4) 

In the following, we derive an expression for the expectation of C{s,t) in estimating C{s,t) to 
quantify the bias, when hn,i = hn for all i, under the assumption that C(-, •) is twice continuously 
differentiable. Suppose for simplicity that the density of the design points {Tjj}"!!^, for each subject, 
is uniform on [0, 1]. Define C{t) = C{t, t) for t G [0, 1], and K2i-) = f K{- - u)K{-u)du. Also, we 
assume that m'^s are given. In the following proposition the bounds hold under hn -^ 0. 

Proposition 2.1. When s ^t, 

¥.[Xi{s)Xi{t)] = J—K2C-^){C{t)+a') + —C'{t)[uK{-u)KC—^-u)du 
rrnhn hn rUi J hn 

+ {l-—)C{s,t) + —0{hn)+0{hl). (5) 

rrii rrii 

And, 

nx^itf] = -^K2mC{t) + a^) + (1 - —)C{t) + —0{hn) + 0{hl) (6) 

niihu rrii rrii 

The O(-) terms involve sup^gro^i] \C {t)\, svi^s^t^K),!] II T)'^C{s,t) \\ and J v?'K{u)du, where T>^ is the 
Hessian operator. 

By Proposition 2.1, it is easy to see, E[Xi{s)Xi{t)] = (1- ^)C(s,i) + 0(/i2) if |s-t| > 2BKhn, 
since the first two terms in (5) both vanish, as wells as the 0{hn) term (see the proof in Appendix 
C for more details). This shows that C{s,t) should be multiplied by mi/{mi — 1) to get rid of the 
trivial bias. However, (5) and (6) also show that the empirical covariance C(s, t) is a highly biased 
estimate of C{s,t) near the diagonal even after this trivial modification, unless /i„ mini<j<„ rrij — > 
oo. This is because the first terms in(5) and (6) are always positive along the diagonal (i.e., when 



|s — t| < 2BKhn), which result in overestimation. In fact the degree of overestimation gets reahy 
big (by a scale factor of /i„) as soon as |s — t| < 2i?j^h„. This demonstrates clearly that the naive 
kernel smoothing approach is intrinsically biased and needs to be appropriately modified. 

To understand the reason for this bias, notice that if a pair of points (Tjj,Tjj/), for some 
1 < J 7^ j' !^ ?^ii is randomly sampled from [0, 1]^, then it has a probability of the order 0{hf^) 
to be in a neighborhood of length and width /i„ of a given point {s,t) (which is away from the 
diagonal). In contrast, there is 0{hn) probability of a randomly chosen point Tij to belong to a 
neighborhood of length hn of the point (i, t) along the diagonal. Therefore, measurements are much 
denser along the diagonal and this explains the difference in rates. 

2.2 Modification to naive kernel smoothing 

In this section, we propose a modification to deal with the bias in the naive kernel smoothing 
approach described in Section 2.1. We propose to remedy the effect of unequal scale along the 
diagonal of the covariance kernel (and the resulting bias) by estimating the diagonal and the off- 
diagonal parts separately. We then use a suitable (smooth) weight kernel to combine those two 
estimates together. 

Throughout the paper, we assume that the density of the time-points {Tjj} is known and is 
denoted by g{-). In practice we can estimate g from the data separately. We further assume that 
there are constants < cq < ci < oo such that cq < g{-) < ci. 

We also propose to use a linearized version of the kernel smoothing to reduce the bias while 
controlling the variance. For this purpose, define Q{s, t) to be a tensor-product kernel (that is a 
kernel of the form Q(s,t) = Q{s)Q{t) for some smooth function Q) with the following properties, 
together referred as condition B2: 

(i) Q is supported on [—Cq, Cq], for some Cq > 0, and Q{-) > 0; 

(n) II Q ||oo< oo; 

(iii) EkezQix-k) = 1. 

(iv) Q is symmetric about 0. 

Property (iii) can be rephrased as saying that integer translates of Q form a partition of unity. As 
an example, the B-spline basis functions (Chui, 1987) satisfy all four properties. Let Qh{-, •) denote 
the kernel Q(/i"^-, /i"^-)- 

For estimation of the diagonal C{t) = C{t, t), let C{t):= C^:{t) — a"^, where a^ is an estimator of 
a'^ (discussed in Section 2.3), and C*(t) is the estimate of C{t) + a'^ obtained by using a linearized 
kernel smoothing of the terms {—Y^j '■ j = 1, . . . , m^; i = 1, . . . , n}. This is because, for each pair 
{i,j), the conditional expectation of the quantity Y-^- (conditional on Tj and rrij) is C{Tij,Tij) + a'^. 
Define a grid on [0, 1] with grid spacings hn and denote the grid points by {s; : Z = 1, . . . , L„} where 
Ln = j^ for an appropriately chosen c^ ~ 1. Then define, 

C*,h„{t) = -J-,- E Et-^^^^') + (* - si)S'M)]Qt,Jt - si), (7) 



with 



S^i^) = ^.flY.'K>^r.is-T,,). (8) 






Note that, (7) is a linearized version of the conventional kernel smoothing, which can be interpreted 
as a local linear smoothing of the empirical variances. A similar principle is applied to construct 
an estimator of the off-diagonal part (see (9) below). The linearization has two advantages: on one 
hand, it helps in reducing the bias in the estimate; and on the other hand it facilitates efficient 
computation both in terms of estimation and model selection. The difference of this linearization 
approach with the local linear smoothing mainly lies in the fact that we are using g{t) (or an 
estimate of g{t)) in the denominator, while in local linear smoothing, the denominator implicitly 
is a local estimate of g obtained by averaging the smoothing kernel in a neighborhood of t. Note 
that, as opposed to our estimator of g, which uses different bandwidth than the one for estimating 
the covariance, local linear smoothing essentially uses the same bandwidth for estimating both g 
and C, and thus it suffers from instability. More specifically, the local linear estimator of Yao et 
al. (2005) involves ratios with a denominator consisting of essentially the number of time points 
falling in a small interval. Since the time points are assumed to be randomly distributed and are 
sparse, in practice this can cause instability. 

Let Xi{t) be the i-th smoothed sample curve as defined in (3), and X[{t) be the derivative of 
Xi{t). Then define the estimate of the off-diagonal part as (with a slight abuse of notation) 

9\ )9{ ) ^^-^ ^j^,^-^ 

■ {X,{si,) + {t- si,)Xl{si,))QhAs -si,t- SiO] . (9) 

Here w{mi) = ^^i is a weight function which is determined through an asymptotic bias analysis 
(Proposition 2.1). Note that, as long as |s — t| > Ahn for some constant A depending on Bk and 
Cq, in the inner sum in definition (9), the terms for which I = V are absent. Therefore, according 
to our analysis in the previous section, they do not contribute anything by way of bias. 
Now let VF(-, •) be a weight kernel on the domain [0, 1]^ defined as 



W{s,t):=W{s-t) = I . ' '2 (10) 




Define Wy^ {s,t) = W{{s - t)/hn) and Wj^ {s,t) = 1 — Wj^ is,t), where hn = Ahn for the above 
A > 0. We then smooth the kernels Wr and Wr by convolving them with a Gaussian kernel 
Qr„{') with a small bandwidth r„ (in the sense that r„ = o{hn))- And with an abuse of notation, 
denote the resulting kernels also by Wr and Wr , respectively. Finally, we are ready to define the 
proposed combined estimator of C{s,t) as 

a,h„(s,t) = Tr^j5,i)C,„(s,t) + VF^Js,t) max{d,„(^),/i2}, (n) 

where C/i^(-):= C^:^h^{-) — a"^. The use of maximum in the second term is just to guarantee that 
the estimator of the diagonal is nonnegative and the bias is 0(/i^). 

We now discuss briefly the computational aspects of the proposed estimator. A key step is the 
computation of the functions S'j(-) and Xi{-) and their derivatives at the grid points s; : / = 1, . . . , L„. 
Each one of these computations requires 0{mi) floating point operations (for each i = 1, . . . ,n). 
From these, we obtain Ch^{s,t) and C^^h^{t) by using (9) and (7), respectively. Both expressions 
are in the form of discrete convolutions, and hence can be computed very rapidly by using the 
Fast Fourier Transform. Thus, the estimation procedure is computationally very efficient, with 
0{nrnLnlog Ln) computations on the whole grid, where fn = maxjmj. 
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2.3 Estimation of o^ 

Here we briefly outline a method for estimating the error variance o"^. The method is similar to 
the approach taken in Yao, Miiller and Wang (2006), and hence we omit the details. 

First, for a given bandwidth /i„, we estimate the function C{s,t) for |s — t| > Ahn, for some 
A depending on Bk and Ck, using (9). Then, as in Yao et al. (2006), we estimate the diagonal 
{C{t) : t £ [0, 1]}, using an oblique linear interpolation, by 

Co,hAt)= -{ChAt-uhn,t + uK) + ChAt + uhn,t-uhn))dG{u), (12) 

JAi ^ 

for some probability distribution function G supported on [^1,^2] where Ai > A. On the other 
hand, we estimate the curve {C{t) + o"^ : t G [0, 1]} by (7* h (t) defined in (7). Now, we estimate 
cx^ by 



1 



Ti 



a' = ——- / (a,,„(t) - Co,hAt))dt, (13) 

J 1 - J JTo 

where < Tq < Ti < 1. It can be shown that (Corollary 4.1 in Section 4) the estimator a^ thus 
obtained is consistent for an appropriate choice of /i„. 

3 Bandwidth selection 

The choice of optimal bandwidth for the kernel is a key step in any kernel-based estimation pro- 
cedure. Yao et al. (2005) use a leave-one-curve-out cross validation score based on the prediction 
error for selecting the bandwidth of the smoother, and an AIC approach for selecting the number 
of non-zero eigenvalues. However, leave-one-curve-out cross validation is computationally very ex- 
pensive. Also, as shown below, the prediction error loss is not an appropriate criterion for cross 
validation under the current context. Therefore, in this paper we address the issue of model selec- 
tion by producing an approximation to the leave-one-curve-out cross validation score based on the 
empirical Kullhack-Leihler loss. The approximation is based on the idea that the estimator obtained 
by dropping any single curve is a small perturbation of the estimator based on the whole data (Peng 
and Paul, 2007). In particular, we use perturbation theory of linear operators to quantify this per- 
turbation and produce a first order approximation to the CV score that is computationally efficient. 
It also enables us to select the bandwidth and the dimension of the process simultaneously. 

We first discuss the choice of the loss function, which is very important for a cross-validation 
scheme. We want to point out that, the prediction problem is intrinsically different from the 
estimation of the covariance kernel. We find out that the criterion based on prediction error loss is 
not correctly scaled, as opposed to the one based on empirical Kullback-Leibler loss. To make this 
point clear, we examine these two cross validation criteria in details. 

Define Yj = (Yjj)"!!-^, /ij = ifJ'iTij))"^!^, i/'j^ = (^i/(7ij))"!!]^. We assume that the covariance 
kernel can be represented using K orthonormal eigenfunctions for some K > 1. Then the leave- 
one-curve-out cross validation score based on the prediction error loss is given by 

n nii 

Here Y}~'\t) = Jl^-'\t) + J2u^i 3v '^^^'^(0> where Jl^-'\t) and ipi~'\t) are the estimates of fi{t) 
and ipu{t) computed from observations {Yj/}" z^. Also, Q^ is the estimated principal component 



score based on observations {Yj/}" z^. Note that, the estimated principal components scores C^^ 
can be obtained through the procedure described in Yao et al. (2005), even though it wih not be 
necessary for the model selection procedure we shall adopt. 

On the other hand, the CV score based on the empirical Kullback-Leibler loss is given by 



CV4K,K) = ^ii{Yf,il\ *\S 



(-«) v(-«)^ 



i,K ^ 



(15) 



i=l 



where 



■'i,K 



K 



:M) ,?,("') ^7,(-'^^ 



Y.^\7'W^:\^r:r +^Wm., 



v=l 



and ii is (up to an additive constant) the negative log-likelihood of the i-th observation under the 
working assumption of Gaussianity, which is 

£,(Yi;/.„Si) = ^log|S,| + ^tr (Sri(Y, - /.,)(Y, - /.J^). 

To gain an understanding of what these CV scores are approximating, we assume that we have 
two independent samples, each with n i.i.d. sample curves. Furthermore, to simplify exposition, 
we assume that /i = 0. Suppose that the estimates ^ = {4'u}^=i, A = {X,/}^^i are obtained 
from the first sample. Then a leave-one-curve-out CV score can be reasonably approximated 
by substituting these estimates in the corresponding empirical loss function based on the second 
sample, and with an abuse of notation we also denote this quantity by CV. If li{^,A) denotes 
the loss function corresponding to the i-th observation in the second sample, then the CV score is 
given by - X^"=i •^i(^, A). For simplicity, we assume that there is a true model (^=i,. A*) within the 
class of models we are considering. A first order expansion of the difference between the CV scores 
under the true and estimated parameters for the empirical Kullback-Leibler loss shows that, with 
high probability. 



-t n 1 " 

-V£i($,A)--V^i(*„A,) 
n ^-^ n ^-^ 

1 " 



i=l 



+0 



/logn 



n 



-V ||s'/'(Sri-S-.i)s' 



Uvl/2 ||2 



i=l 



1/2N 



(16) 



where || • ||f is the Frobenius norm, and S*j and Sj are the covariance matrices of the ob- 
servations Yj = (Yji, . . . , YimJ'^) corresponding to the true parameter (^*,A=|<) and estimates 
(^,A), respectively. Since we can essentially ignore the O(-) term in (16) as long as ^ X]"=i || 

Sj (S=|,j — Sj)Sj |||, is not too small, (16) gives a quadratic approximation to the CV score. 
Notice that, in each term within the summation of this quadratic approximation, directions with 

^ — 1/2 

high variability are down-weighted by the multiplicative factors S^ . Therefore this CV score 
based on the empirical Kullback-Leibler loss is properly scaled. Moreover, note that approximation 
(16) does not really depend on Gaussianity but only on the tail of the distributions involved. 



On the other hand, it can be shown by simple algebra that, up to a multiplicative factor, the 
CV score based on the prediction error loss is CV = ^ X^iLi^j(^, A) where £i{^,A) = tr (S~^5j), 
where, Si = (Yj — /lj)(Yj — /ij^ is the empirical covariance matrix corresponding the i observation 
vector. The corresponding difference of the CV scores between estimated and true parameters 
becomes (ignoring the multiplicative constant), 

-V£,($,A)--V?,(M/„A,) 
n ^-^ n ^-^ 

-^tr [(s-2 - s;/)s„] + - E*^ [(^r' - K■){s^ - s„)] 



n ^ — ' n 

1=1 «=i 



1 '^ 



n . 



1/2, 

4 J 



+ 



'logn 



n 



1 " 

- V II 5^"^/^m2- /- ^y"^/2 ||2 

^ Z^ II ^*j y^i '-mO^n \\F 



1/2N 



(17) 



1/2^ — 1 1/2 

with high probability. Here Ai = S^^ S^ S^^ which is already properly scaled. Therefore, from 
(17) it is clear that this CV score itself is not correctly scaled. Also, the expression ^ Y^=\ ^^ [^*j (^' 

— 1/2 

/m-)S^^ ] appearing in (17) is not necessarily nonnegative. This means that the prediction er- 
ror loss does not enjoy the pleasing property of the Kullback-Leibler loss that the minimum of 
the expected loss occurs at the true parameter. Hence the use of the prediction error loss is not 
recommended for the current problem. 

3.1 First order approximation 

Direct computation of the criterion CV^{K^hn) (equation (15)) is a laborious process since we 
need to compute C\, ^ {s,t) and perform its eigen-analysis for every i = l,...,n. Therefore, we 
propose to approximate CVt:{K,hn) by using a first order approximation to the quantities fil * , 

'ipl (•) and Xu around the estimates /Ij, i^ui') and X,y, respectively. The approximations of the 
eigenfunctions and eigenvalues is based on a perturbation analysis approach. The key idea is that 
the leave-one-curve-out estimator Cc of the covariance can be viewed as a perturbation of the 
linear operator Cc- The key component is Proposition 3.1 which uses a result on perturbation 
of eigenfunctions of a linear operator (Lemma 7.1 in Appendix A). Note that, our approximation 
scheme can also be applied to CV scores based on some other loss functions, such as CV{K, /i„). 

Using Lemma 7.1, we can get a first order approximation to the quantities ip^^ and A^/ 
that depends on the observations through a term that is linear in Aj(s,t) = Cc{s,t) — Cc (s,i) 
(for convenience we omit /i„ in the notation). Since the latter quantity has a rather simple ex- 
pression which involves essentially only the i-th observation, this step substantially reduces the 
computational burden of the cross-validation procedure. 

Proposition 3.1. For the proposed estimator Cc given by (11), we have, 

(i) 

^V^ - ^^u = {¥.^'\Ti,) - MTr,))fl, « {{H./^Su){Ti,))JZ,-, (18) 
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Ai-*)-A, «-ir(P,Ai); (19) 



where 



(a) Pu = Tpu 'Si Tpu where, for f,g ^ L^([0, 1]), f (^ g denotes the integral operator with kernel 
f{x)g{y) and acts on any w G L'^{[0, 1]) as {f ^ g){w){x) = (/q g{y)w{y)dy)f{x); 



(b) 



^ I ^ ^ if ^ ^ ^ \ 



K 



(a') 

(b') 

Also, 
(Hi) 



fcj-j^ Ai/(Afc — Ajy) Ajy Ajy 

with 5 being the Dirac 5- function, i.e., J 5{t — u)w{u)du = w{t) for any smooth w G ^^([0, 1]). 
Here tr(P,yAi) and {Hi,/\itpi,){t) are defined as follows: 

tr{P^Ai) = / ii^{u)Ai{u,v)4>u{v)dudv; (21) 



{H^Ai^^){t)= H^{t,u)Ai{u,v)^^{v)dudv. (22) 



1 ^, . 1 1 ™' 



/2(-*)(t) - /2(t) = -f,{t) Y,Y,,Kh,{t - T^J), (23) 

n — 1 n — Irrii ^-^ * 



i ■ -I 



where fi{t) = - Y^^=i ^ YlT=i ^iJ-^h^it — Tij), with h^ being the bandwidth for estimating n (chosen 
separately). 

After we obtain the approximations for i/jj^ and A1> * from Proposition 3.1, we plug them 
back in equation (15) for CV^:{K, hn) to obtain the final approximation of the CV score, denoted 
hyCV4K,hn): 

cv4K,hn) = ^Ei°gi^«i + ^Et^(^r'(Y. - nt^){Y^ - ntY), 

i=l i=l 

where Ej = Y.^^\ ^iv^iv^iv + S^^_i)-^m, , with 

\y = A^ - tr [P^Ai) and -0,^ = -0^^ + {{H^A^^);){Jij))'^l^, 
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and ^,- = (/!'• {Tij))'^li, with Ji^ *) given by (23). An expression for 3'?_^^ — ;^^5^ is easily 
obtained by using (7), (12) and (13). Note that this step does not require any extra computation 
beyond that for computing a^. 

Observe that our objective of minimizing the criterion CV^:{K,hn) is to estimate the number 
of nonzero eigenvalues and to select an appropriate bandwidth for estimating the eigenfunctions. If 
instead the objective is to select an appropriate bandwidth for estimating the covariance kernel, we 
can do so by replacing the term J2u=i ^iy''Piv'4^iu ™- the definition of Sj with the leave-one-curve- 
out estimate of covariance kernel, viz. C^ ^ evaluated at the design points, and minimizing the 
corresponding CV criterion. This distinction is important since the theoretical results (Theorems 

4.1 and 4.2) show that the optimal rates for the bandwidth /i„ are different for estimating the 
covariance kernel and its eigenfunctions. 

3.2 Representation of H,,Ai^p,, and tr (P^Aj) 

In order to obtain the approximate CV score CV^{K,hn) efficiently, we need to compute the 
quantities {Hy/\itjjy){t) and tr (Pj^Aj) in an efficient manner. Thus we have the following further 
approximation based on Lemma 7.1. 

Proposition 3.2. We have 



(P,A,^,)(t) 

w{mi) sr^ Afc - 

^ ^ k^u ^'^y^k - K) 

+ ^^--(7.,h„(^))V.(i) 
"-1 A^ 



wirrij) 11" 



^-1 x.9it)frt 

K 



Y^iMsi) + {t- si)xi{si))QhM - thu,hAht) 



w{mi) 
' n-l 



^.^^ A^(Afe — Ay) \u 



+^ E T /' -r M t) E / ^^(^j^;(^^ (g.(.,)A,.(n, si) + S'M)P,M^, si))du 



kj^v Xy{Xk - K) 1=1 



Au 1=1 -J 



9{u) 



{Si{si)Pi^hiu, si) + S'i{si)(32,hiu, si))du 



J^n 



1 1 X^ror ^. .. ^ , oU ^o u ^^V'.(t) 



T^ y,iS^isl)/3l,hii^ Si) + S[{si)(52,h{t, Si))- 

n-^K^i 9(0 

+ {df_,) - ^^d^){ f H,{t,u)du){ f Mu)du); (24) 
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(ii) 



n-l J g{u) 



where 
(a) 



Lji 



^k -r^ \ , . V^ -,, ,^ / ^, 



(b) 



where, for any two functions /i and /2 defined on [0, 1], 

Go(/l,/2)(s) = {fl*f2)is) = J fi{x)f2{s-x)dx, 

Gi(/i,/2)(s) = {fi*ixf2))is)= I fi{x){s-x)f2is-x)dx; 
7k,kAht) = E /(I - l^^„(t,t^))[(X,(.0 + (^ - sz)X,'(.,))Q.J^; - v)]'^dv; (27) 



(c) 



(i)^„.^vO■')^...^ /"w^ ^„, „)M^M^ 






g(u) q(v) 
• (u - sO^(v - siYQhAsi - u)QhAsi> - v)dudv; (28) 



(d) 



/?i,h(n,s) = / Q^( -s)dv (29) 

/?2,h(n,s) = / ^ {^ - s)QA^ - s)dv. (30) 

In the above, the computation of 7fc^/i^(i) can be easily done by using fast fourier transformation. 
Also, 7fc,ft,„(«,i) ~ 7fc,/i„(«) for all t S [0,1]. However, the computation oi'^^^yy^Jyi) involves a 
double integration. Thus we need to do some approximations to simplify the computation. A 
computationally efficient approximation to 7fcfc'h„(^) is described in Appendix B. Computation 
of /3i,/i(^) and P2,hiu) can be done in closed form whenever Qh{-) has a "nice" functional form 
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(e.g. a B-spline). From Propositions 3.1 and 3.2 it is clear that most of the components have 
aheady been computed in constructing the estimator, and convolutions can be performed in a 
fast manner by using FFT. Thus, the key advantage afforded by Proposition 3.2 is to replace the 
expensive computation of double integrals to a much cheaper computation of single integrals and 
convolutions. See Appendix F for details of some of these steps. 

4 Asymptotic properties 

In this section, we present the theoretical properties of the proposed estimators through a large 
sample analysis. Our main interest is in the estimation accuracy of the covariance kernel and its 
eigenf unctions. The statements of the results and the associated regularity conditions are given 
below. 

We first state the following assumptions on g, the density of the design points; C, the covariance 
kernel; and {^fe}fcli> the eigenfunctions. 

Al g is twice continuously differentiable and the second derivative is Holder (a), for some a E 
(0, 1). Also, the same holds for the covariance kernel C. 

A2 maxfclll Vfc ||oo, || ip'k lloo, || V'fc lU} is bounded. 

A3 There are constants < cq < ci < co such that cq < g{-) < ci. 

We also assume that the kernels K{-) and Q{-) satisfy conditions Bl and B2, respectively. We 
need to make further assumptions about the covariance kernel C and the correlations among the 
sample curves. Let R denote an n x n matrix with (i,j)-th entry pij. Assume: 

CI Al > A2 > • • • > Xm > and Xm+i = • • • = 0. That is, the nonzero eigenvalues are all 
distinct and the covariance kernel is of finite dimension. 

C2 maxi<i^<jv/(At/ — Xu+i)~^ is bounded above. 

C3 ^tr [(R — In)^] ^ as 71 ^ cxo, and || R ||< k„ for k,„ > 0. 

Note that, the first part of C3 quantifies the total contribution of the correlations among the sample 
curves in the variance of the estimated covariance kernel (see Theorem 4.1). The second part of C3 
imposes a stability condition on the correlation matrix R. In other words, the sample curves are 
"weakly correlated" as || R || is bounded by k„. Define m = mini<j<„?7ij and m = maxi<i<„7nj. 
We further assume that 

C4 m/m is bounded above as n ^ 00. 

We now give the bias and variance of the proposed combined estimator. 

Theorem 4.1. Suppose that conditions A1-A3, B1-B2 and C3-C4 hold. Assume further that 

a^ is known and C{^ = C^:{-) — a^ where C*(-) is defined through (7). Suppose further that in 
the definition (11), h^ = Ahn for some constant A > 4:{Bk + Cq). Then, with hn = o(l) and 
uK^ -^ 00, the estimator Cc satisfies: 

nCc{s,t)] = Cis,t)+0{hl), (31) 

Var[d,{s,t)] = o(-)+o(me.K{—^,-l—}) + I^J2f^Ao{l), (32) 
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where the O(-) terms are uniform in s,t & [0, 1]. 

One implication of Theorem 4.1 is that it gives the rate of convergence of the estimator a^ defined 
in (13) as illustrated in the following corollary. 

Corollary 4.1. Suppose that conditions A1-A3, B1-B2 and C3-C4 hold, and in the definition 
(11), hn = Ahn for some constant A > 4:{Bk + Cq). Then, with hn = o(l) and nh'^ — > oo, 



-^^^}) + I AEpI I 0{l) + 0{ht), 



Eia'-aY = O ( i) +0 ( max{-^^^,-,— } ) + | -> :ft^, | 0(1) +0(/.^), (33) 



where the 0{-) terms are uniform in s,t £ [0, 1]. 

Using Corollary 4.1 and Theorem 4.1, we get a bound on the variance of the proposed estimator 
of the covariance kernel when o"^ is estimated by a^ defined in (13). 

Corollary 4.2. Suppose that conditions A1-A3, B1-B2 and C3-C4 hold, and in the definition 
(11), hn = Ahn for some constant A > 4:{Bk + Cq). Then, with hn = o(l) and nh^ — > oo, 



Var[C,{s, t)] =o(-)+0 Lax{— 1^, -^}) + ( ^ f^ 






p^AOil) + 0{ht), (34) 



where the 0{-) terms are uniform in s,t £ [0, 1]. 



Next we state the result about the asymptotic behavior of the estimated eigenfunctions. Let 
the loss function for ip^, be the modified L'^-loss given by 

L^ipy.tpu) =11 ijy - sign((V'v,'0i/))'^i/ II2. (35) 

where || • II2 denotes the L^ norm, and {ipujipu) = /q iJuix)ipu{x)dx. For the statement of Theorem 
4.2, we only need to assume that the estimator a'^ of u^ satisfies E,(a'^ — o"^)^ = o(l). 

Theorem 4.2. Suppose that conditions A1-A3, B1-B2 and C1-C4 hold. Suppose further that 
in the definition (11), hn = Ahn for some constant A > 4(i?/< + Cq). If mhn = o(l), nhn -^ 00 
and Knfnhn^n^^''^^'' — > for some e' > 0, then the estimator ipy, which is the eigenfunction 
corresponding to the v-th largest eigenvalue of Cc, satisfies: for any arbitrary hut fixed e > 0, 

sup W.L(4^uAu) < (l + e)-| E 7Y:^^2 



(c?:5)6e '^ \i<dr<Af(^fc-^-) 



l<k=/=u<AI 



where Q denotes the class of covariance- density pairs {C,g) satisfying the conditions A1-A3, Bl- 
B2 and C1-C4. 
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One important implication of Theorems 4.1 and 4.2 is that, if the correlation between sample 
curves is "weak" in a suitable sense, then the best rate of convergence for the correlated and i.i.d. 
cases are the same. Comparing with the i.i.d. case, we immediately see that, in order for this to 
hold, under the conditions of Theorem 4.2, we need 

n^ ^-^ ■' \ nhn ^m J 

where /i^,* is the optimal bandwidth choice (at the level of rates of convergence) for the i.i.d. case. 
Also, in order to ensure the optimal rate for the estimate of the covariance kernel in the correlated 
case is the same as that in the i.i.d. case, it is sufficient that (by Corollary 4.2) 






^^ .1^ -J \ nhn^mnhi^m? 

where hn,* is the optimal bandwidth choice for the covariance estimator (at the level of rates of 
convergence) for the i.i.d. case. Specifically, for estimating the covariance, /i^.* = (nm?) (by 

Theorem 4.1, and under the setting where rnhn,* = o(l))) and for estimating the eigenfunctions, 
^n.* = (nrn)~ ' (by Theorem 4.2). Thus, one notices that the optimal bandwidth for estimating 
the covariance and its eigenfunctions are different, at least in the case where m can only grow rather 
slowly with n. Combining the lower bound given by (Theorem 2) in Hall et al. (2006) and the upper 
bound from Theorem 4.2, it follows that when m is bounded, the rate of convergence of L^-risk 
is optimal if (37) holds. Thus, under this setting the proposed estimator of the eigenfunctions is 
optimal even in the situation when the sample curves are weakly correlated. Similarly, under the 
setting of Theorem 4.1, if (38) holds, then the L^-risk of the proposed estimator of covariance also 
has the optimal rate under an appropriate choice of bandwidth. 

Another important point is that the conditions in Theorem 4.2, specifically that m/i„ = o(l), 
nh'^ — > oo, and mK„/i^^n~^'^"*"'^ = o(l), which imply that m = o{v}''^), are not the most general 
conditions. We conjecture that (36) hold under weaker conditions. Indeed, in the i.i.d. case, 
(36) holds (without the second term on the RHS) under a much wider range of possible values 
of m as indicated by the following result. The following result gives a lower bound on the rate 
of convergence of the first eigenfunction when m — > oo under the i.i.d. setting. This bound is a 
refinement over an analogous result (Theorem 2) in Hall et al. (2006), even though the latter holds 
for all eigenfunctions. Notice that this lower bound, together with the upper bound elucidated in 
the paragraph following Theorem 4.2, implies that at least for the first eigenfunction, the best rate 
of convergence for eigenfunctions, viz. 0{{nrn)~'^'^) is optimal when m — > oo at a faster rate and 
if (37) holds. 

Theorem 4.3. Let C denote the class of covariance kernels S(-, •) on [0, 1]^ with rank > 1, and 
nonzero eigenvalues {Aj}j>i satisfying Cq > Ai > A2 > with Ai — A2 > Ci, and the first eigenfunc- 
tion ^1 being twice differentiable and satisfying \\ ijj'l ||oo< C2, for some constants Co,Ci,C2 > 0. 
Also, let Q denote the class of continuous densities g on [0, 1] such that ci < g < C2 for some 
< ci < 1 < C2 < 00. Suppose that we observe data according to models (1) where Xi{-) are 
i.i.d. Gaussian processes with mean and covariance kernel T,. Also suppose that the number of 
measurements rm 's satisfy rn<mi<m, for m > rn> A, such that m/m < C3 for some C3 < 00, 
and m = o{n?'^). Let V denote the space of such designs D = {ttij}"^]^. Then for sufficiently large 

16 



n, for any estimator f/'i with I2 norm one, the following holds: 

sup sup sup E II -01 — ipi \\\> Ci{nrn)~ ' . (39) 

Del' see sec 

The proof of Theorem 4.3 is given in Appendix G. 

5 Outline of the Proof of Theorems 4.1 and 4.2 

In this section, we briefly describe the main ideas leading to the proof of Theorems 4.1 and 4.2. The 
technical arguments are given in the appendices. The proof of Theorem 4.1 uses direct computation 
(Appendices C and D). The basic idea in the computation of the moments is to treat the diagonal 
and the off-diagonal parts of Cc{-^ •) separately. The proof of Theorem 4.2 heavily relies on an 
application of Lemma 7.1. In view of this, the key quantity in the derivation of asymptotic risk is 
the computation of E || H^Cc4>u Hi) where || / ||i denotes f^ f^{x)dx for a function / G i^([0, 1]). 
Once we obtain an expression for this (as given in Section 5.1), we use a probabilistic bound on 
the operator norm of the difference between estimated and true covariance kernels, to complete the 
proof. Proofs of Theorems 4.1 and 4.2 require repeated computation of mixed moments of correlated 
Gaussian random variables. The details of all these computations are given in the appendices. 

5.1 Asymptotic risk for estimating ip^, 

The key result in this section is the following proposition. 
Proposition 5.1. Under the assumptions of Theorem 4-2, we have 

EllF.a^.lli < (l + e)i( Y. ^'^' 



n \ ^-^ (Xk — Ai/)^ 






+°(''"' + °(;s;^l (^o' 



for any arbitrary hut fixed e > 0. 



Here we briefly describe the main idea of the proof. For convenience of exposition, throughout 
we replace max{C(^), /i^} in the definition (11) by C(^). Using appropriate exponential in- 
equalities for C=i,(t), it can be shown that, asymptotically this does not make any difference as long 
as mintg[o^i] C{t,t) > 03 for some C3 > 0. Also, for computational purposes, it is helpful to consider 
the unsmoothed version (10) of the kernel W, and take /i„ = Ahn, where A > 4(i?x + Cq). The 
advantage of this is in being able to deal with the contributions from the diagonal and off-diagonal 
parts of the estimator separately. Since the definition of Hu involves the Dirac-(5 operator, we need 
to account for the contribution of terms involving 6 carefully. The estimation error in a^ also plays 
a role, and is taken into account separately. The main decompositions that facilitate the compu- 
tations are given through (55) - (57) in Appendix D. The last bound reduces the task of bounding 
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E II HyCci^y II2 to that of bounding E || HyCc'4'v Hi) with Cc{-, •) as described in Appendix D. Note 
also that, if a'^ is assumed to be known, then the decomposition (57) is not required, and we can 
get rid of the multiphcative factor (1 + e) in the expression (36) for the risk in Theorem 4.2. 

5.2 Norm bound on Cc - ECc 

To complete the proof of the theorems, we need to find a probabilistic bound for || Cc — ECc ||, 
where || • || denotes the operator norm. We shall first find a bound on the sup norm of Cc — ECc, and 
then we can bound the operator norm || Cc — '^Cc \\ via the inequality || Cc — '&Cc ||<|| Cc — '^Cc \\f^ 
where || • \\p denotes the Hilbert-Schmidt norm. This is in turn due to the inequality, 

II Cc- ECc ||f< sup |Cc(x, 2/)-ECc(x,y)|=:|| Cc- ECc Hoc- 

a;,3/e[0,l] 

Note that, by piecewise differentiability of the estimate Cc, in order to provide exponential bounds 
for the deviations of || Cc — ^Cc ||oo, it is enough to provide exponential bounds for the fluctuations 
of |Cc(s, i)— E[Cc(s, t)]| for a finite (but polynomially growing with n) number of points (s, t) £ [0, 1]. 
Thus, we fix an arbitrary {s,t) G [0,1] and derive an exponential inequality for the deviation of 
estimate at this point. For simplifying the computations, without loss of generality, we assume that 
g is the density of the Uniform(0,l) distribution. Then we have the following proposition. 

Proposition 5.2. Under the conditions of Theorem 4-2, given f] > 0, there is a Cr^ > such that, 
for every fixed s,t £ [0, 1], 



P MCc(s,t) -E(Cc(s,t))| > c^m^nir-^j < n-\ (41) 

The proof of Theorem 4.2 then follows by noticing first that by Lemma 7.1 and the fact that 

'tpv h=\\ ipu h= 1, 



EL(V., V.) < E II H.Cc^l^, M (1 + Sn,r,) + 2P II Cc - E(Cc) II > c; 



mK. 



Ilogn 



n\ 



for some r/ > 0, c'^ > and 6n,r] — > appropriately chosen, and then using Propositions 5.1 and 5.2. 

5.3 Connection to parametric rate for "purely functional" data 

It is instructive to compare the optimal rate for our procedure with that obtained by Hall et al. 
(2006). We can regard the first line on the right hand side of (40), as the parametric component of 
the risk and the second line as the nonparametric component. If we take h = 0(n~^'^), then for 
bounded m we get the optimal nonparametric rate. For consistency of 'ipu in Lp' sense, we clearly 
need -^ ^^ /^ pf ^ = o(l) (used in Theorem 4.2). If ?tt, increases with increasing sample size, then 
the rate also improves. But there is no result about optimality. 

When the observations are i.i.d., it can be checked by using a modification to the proof of 
Proposition 5.2 that, if m ^ 00, /i ^ 0, such that (mh)~^ = o(l) and h = o{n~^'^), we obtain 
the parametric rate for the L^-risk of ^i, (as indicated in Hall et al, 2006). In other words, under 
that setting there is asymptotically no difference between the risk of estimating the eigenfunctions 
from data obtained with observational noise and measured at randomly distributed points, and 
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that from data measured on the continuum without noise. Indeed, such a scenario is possible if 
rn~^ = o(n~^'^~'^), for an e > 0. Then, by taking /i,„ = o(n~^'^), and assuming that either cr^ 
is known, or an estimator a'^ satisfying |a^ — a'^\ = Op{h'^) is available, we attain the conditions 
mentioned above. We conjecture that the same result holds even when the observations are "weakly" 
correlated. 

6 Discussion 

In this paper, we presented a procedure for estimating the covariance kernel and its eigenfunctions 
from sparsely observed, noise corrupted and correlated functional data. The estimator for the co- 
variance kernel is based on merging two separate estimators: (i) the estimator of the off-diagonal 
part based on computing linearized empirical covariances of the smoothed version of individual 
sample curves; (ii) the estimator of the diagonal part based on linearized kernel smoothing of the 
empirical variances. The importance of this modification to the naive kernel smoothing approach, 
especially in the scenario when the number of design points per curve is small, is demonstrated 
through an asymptotic bias analysis. The linearized version of the kernel smoothing helps in re- 
ducing bias, while controlling the variance, and is computationally appealing. Asymptotic risk 
behavior of the proposed estimators is studied under the assumption that the sample curves have a 
"separable covariance" structure and are "weakly" correlated. Exact quantification of the asymp- 
totic risk for the eigenfunctions is obtained under the Gaussian setting (Theorem 4.2). It is also 
shown that the L^-risk for the eigenfunctions achieves the optimal rate, under an appropriate choice 
of the bandwidth, when the number of measurements per curve is bounded. Also, in the i.i.d. case, 
we obtain a lower bound on the rate of convergence for estimating the first eigenfunction that is 
sharper than bounds in the existing literature, which proves the rate-optimality of our estimator in 
a wider regime. Finally, we propose a computationally tractable model selection procedure based 
on minimizing an approximation to the leave-one-curve-out cross validation score that uses the 
empirical Kullback-Leibler loss. We also show that in the context of estimating the covariance 
kernel or its eigenfunctions, it has clear advantages over the commonly used prediction error loss. 

The proposed procedure for estimation and model selection is easily implementable and com- 
putationally more tractable as compared to some of the existing methods. Moreover, due to the 
linear structure of the pre-smoothing of individual curves, our estimator is stable. Furthermore, 
the linear structure of the proposed estimator also allows for a simple approximation to the cross 
validation score. Finally, even though the results are proved under Gaussianity of the noise process, 
it can be shown that at the level of rates of convergence, the upper bounds hold under sufficient 
moment conditions on the noise, and hence the estimator is expected to be robust to distributional 
assumptions. 

There are a few aspects of the estimation procedure that need further exploration. In the 
asymptotic analysis, we assumed that g, the density function of the design points, is known. In 
practice it has to be estimated from the data. Additional computations are needed to show that 
the results derived here hold under that setting as well. It will be useful also to study its impact 
on the estimation procedure through simulation studies, and in real data applications when the 
assumption of exact randomness of the design points may be violated. 

A natural generalization of the framework studied in this paper will be when the principal 
component scores jointly form a stationary vector autoregressive process. Under such a setting, we 
would like to extend the estimation and model selection procedures described here to exploit the 
special structures of such processes. This is likely to summarize the statistical properties of some 
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real-life phenomena and also help in model building and prediction, for example in spatio-temporal 
models when the covariance is not separable. 

7 Appendix 

Appendix A 

Perturbation of eigen-structure 

The following lemma is a modified version of a similar result in Paul and Johnstone (2007) . Several 
variants of this lemma appear in the literature (see, e.g., Kneip and Utikal (2001), Cai and Hall 
(2006)), and most of them implicitly use the approach taken in Kato (1980). In the following we 
use II A II to denote the operator norm of an operator A, i.e., the largest singular value of A. 

Lemma 7.1. Let A and B he two symmetric Hilhert-Schm,idt operators acting on -L"([0, 1]). Let the 
eigenvalues of the operator A he denoted hy Ai(^), A2(^), • • • • Set Ao(^) = oo and Aoo(^) = —00. 
For any r > 1, if \j.{A) is a unique eigenvalue of A, i.e., if \{A) is of multiplicity 1, then denoting 
hy Pr the eigenfunction associated with the r-th eigenvalue. Then 

Pr{A + B) - sign{pr{A + B), Pr{A))pr{A) = -Hr{A)Bpr{A) + Rr 

where Hr{A) := J^s^^r x (a)~\ (A) '^^s{^) ^'^^ PSsi^) denotes the orthogonal projection operator 
onto the eig en- sub space Eg corresponding to eigenvalue XsiA) (possihly multi- dimensional). Define 
6r and 6r as 

6r := \ [II Br{A)B II +|A,(^ + 5) - \.{A)\ II Hr{A) II] 



mini<j^r<oo \\j{A) - \r{A)\ 
Then, the residual term Rr can be bounded as 



Rr \\< min MOjJ, || Hr{A)Bpr{A) 



26r{l + 26r) II Hr{A)Bpr{A) 



1- 25^(1 -F2(5r) {I - 25ril + 25r)y 



where the second bound holds only if 6r < ^^ — . 

In addition, if 1 < ri < r2 are such that Ar^(A) > Xrj^+i{A) = ■ ■ ■ = Xr2(A) > Xr2+i{A), then 



1-2 



^ {X,{A + B)- Xk{A)) = tr{Pe^^ {A)B) + Rr 



k=r\ 



where Ps^ {A) is the orthogonal projection operator of A corresponding to the eigenvalues Xr^ {A), . . . , Xr^ {A), 
and the residual Rri,r2 satisfies 

,- , . ^ 6 II 5 IP 

|^r-i,r2l < (^2 -ri + 1) 



mini<j^r<oo \Xj{A) - Xr{A)\ 
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Large deviations of quadratic forms 

The following lemmas are from Paul (2004). Suppose that ^ : X ^ M"^" is a measurable function. 
Let Z he a random variable taking values in X. 

Lemma 7.2. Suppose that X andY are i.i.d. Nn{0,I) and are independent ofZ. Then for every 
L > and < S < 1, for all < t < j^L, 

n^\X^HZ)Y\ > t, II $(Z) ||< L) < 2exp (-^1^ 

Lemma 7.3. Suppose that X is distributed as Nn{0,I) and is independent of Z. Also assume that 
$(z) = $^(z) for all z £ X. Then for every L > and < S < 1, for all < t < ^L, 

P(-|X^$(Z)X - Tr{^{Z))\ > t, II $(Z) ||< L) < 2exp f- ^-^'^ 
n \ 4L^ 

Computation of conditional mixed moments 

In order to calculate the bias and variance of the proposed estimator, we need to compute the 
conditional expectations KiYi-^j-^Yi^jiYi^j^Yi^ji |Tj^,Tj2) for various choices ofii,i2, ji,Ji,J2,J2- We 
shall use the following well-known result, which is a special case of Wick formula (Nica and Speicher, 
2006, p. 129) for computation of mixed moments of a Gaussian random vector. 

Lemma 7.4. IfWi,W2,Wz and W^ are jointly Gaussian with mean zero and covariance matrix S, 
then 

E{WiW2W3Wa) = S12S34 + S13S24 + S14S23. (42) 

We shall use the formula to compute the above mixed moments with the observation that 
Cov(Xj^j^,Xj2J2|Tji, Tjj) = pi-^i2C{Ti^j-^,Ti2J^). The details of this computation in various generic 
cases are given in Appendix F. 

Appendix B 

In Appendix B and the following appendices, we shall often write h and h to denote /i„ and /i„, 
respectively, and we shall drop the subscript /i„ from the covariance estimates. For example, Cc 
will be used to denote Cc^hn ■ 

Proof of Proposition 3.1 



This is a straightforward application of Lemma 7.1, by taking the estimated covariance kernel Cc 
as operator A and — Aj = Cc 
to the zero eigenvalues of Cc- 



as operator A and — Aj = Cc — Cc as operator B. Note that in (20) the last term corresponds 
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Proof of Proposition 3.2 

We can express Aj(n, u) as Ai{u,v) + Ri{u,v) + (Sf^i) — :^^cr^) ~ T^Cc{u,v), where 



{l-Wr(u,v)) 



w{mi) 1 
n-l g{u)g{v) 



i^n 



^ (XM) + (^ - si)X'M)){XM') + {v- si)X[{sv))Q^^{u - si)Q„^{v - s. 



Ll'=l 



1 1 



+^/xJ^'^);r3T77^±^E 






QhJ^-^0 (43) 



and Ri{u,v) equals (with z denoting ^^^) 

n 

+(1 - w-^ju,v))Y,^^ [(fii~j\^) - Kj{^)){^^*M - KM) 

+ {Ji^^j\v) - Jl*j{v)){n^j{u) - fi*j{u)) - (/2l7/-'(n) - Jl*,j{u)){Jl[~f (v) - fi*,j{v)) 



z- si 



V- - -yyv-y ^. '-J ^1 

1 JL 1 I^ , , _^n_ 



1 = 1 



1 " 1 "^-^ ^" 



where, the kernel Ks,i{') = ^s,i,h„{') is defined as 



KsA^) = T- 
K 



"-71 \ "-n / 'T'n 



(44) 



for s G [0, 1] and / = !,..., L.„; and for any function /, 



U,j{x) = ig{x))-' Y.(fj{si) + {x- si)r^{si))Q^Sx - si) 



1=1 



with f,{s) := ^ Zlli f{T,k)KhSs - T.k); and 



^jix) = {g{x)) ^ J2^ej{si) + (x - si)e'j{si)]Qh„ix - s/) 



1=1 
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with Sjis) = ^ Y.TI1 ^jkKhAs - Tjk). 

Since HiyCc4'u = XuHj^ipiy = 0, it follows from the representation of Aj that 

where l{u,v) = l{o<n,i)<i}- It is easy to see from the expression for Ri{u,v) and (23) that for 
reasonable choices of h^, the contribution of Ri{u, v) can be ignored, since it is of a smaller asymp- 
totic order (in fact can be shown to be op{n~^)). Hence, we end up with the approximation 

Thus, we can separate out the first term on the RHS of (43) into two parts - one with multiplier 
1, and the other with multiplier W^ {u,v). Then using (22) and the second representation of Hy 
in (20), we obtain the expressions in the first four lines on the RHS of (24). Next, using the fact 
that Wj^ {u,v) « lnu-v\<^hn}^ ^^"^ using the approximations ipuiv) ~ il^u{u) and g{^^^) « g{u) 
on the interval [u — -^hn,u + -jhn], we obtain the last three terms on the RHS of (24). Now, using 
(21), noting that tr {PyCc) = A^, and following similar arguments, we have (25). 

Approximation o^^k.k'.hS^) 

First, to fix notation, suppose that /i„ = Ahn for some constant ^ > 0. Then, by definition of 
Wr , and the symmetry of Q, the integral appearing in (28) can be expressed as (ignoring the 
boundaries) 

4,kk,H^ ■■= [ |^(^ - ^i^QhSsi - u) j^^'j t^{v - s,y'Q,Jv - s,)dvdu. (45) 

Noticing that, on [u — ^hn,u + ^hn], [^ can be approximated as V^ , we can approximate 
the inner integral (with respect to v) by 

[v - si>y Qf,Jv - si>)dv 



g{u) Ju-Ah,, 



M "*" — y—:— / w^ Q{w)dw, (setting w 

g{u) j!l_IiL-4 



_ hr, 

hn 2 



_. ,.'.iVvMg^Q (!i^) =: e^M^G^^,., (n). 
g{u) ^ \ hn J "" g{u) ^ '''''"' 



Substituting this in (45), we have the approximation 



d" 



{-1)^}C^ f '^^^^^^^Gl,,,S-)^s, - uYQ^M - -)du 



"ll'M'-An \ -J -n 1^ ^2(-^) 



i-^'h^^'G, ^^j Gf,,,,^,Q,] is,) =: 4m';.„, (46) 



by definition of Gj{fi, /2)(-)i J = 0) 1- Since 



u~ sii A u — si' A 

tin ^ '^n ^ 



n [-Cq, CQ]=(t>^\u- si>\ > {Cq + -)hn, 
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then G^, {j-f^j = if |m - s;'| > {Cq + 4)^n- Furthermore, Qh^{u - s;) = if \u - s/| > Cqhn. 

This means that, if either |ii — s;| > Cqhn, or \u — si'\ > {Cq + -g)/!^ then the integrand in the 
first step of (46) is zero. So the domain of integration is, effectively, [si — Cqhn, s/ + Cqhn] n [s// — 
{Cq + -2)hn,si' + {Cq + 2")^n.]- This imphes that if \si — sii\ > {2Cq + 2")^n; then the effective 
domain of integration is empty, meaning that 

—jj' ^ 

dii',kk';hr, = ^ if \si - si>\ > {2Cq + -)hn. 

If Q(-) is chosen to be a centered cubic B-spHne (so that Cq = 2), we can compute G^(-) exphcitly, 
without having to perform a numerical integration (Appendix F). 

Appendix C 

In the following, we often drop the subscript n from /i„ for simplicity and sometimes we even drop 
the subscript h from the notation. 

Proof of Proposition 2.1 

By elementary calculations, and supposing that rrii > 2 for each 1 < f < n, we have 

E[X,{s)Xi{t)] 
= ^ V E[Y,,Y,,,^K{'—^)K{'-^)] 



mfhl 



f{C{n, u) + a')K{'—^)K{i-^)du + 



miimi — 1) 1 11', ^ ^^,s — u. ^^,t — u. , , 
^ ^ ' ' C{u,v)K{—- — )K{——)dudv 



mf 






11/* ■/■ 

— 1— {C{t + hnU,t + hnu) +a^)K{-u)K{^ u)du 

rrii hn J hn 

Tfl ' — 1 / / 

H ^ / / C{s + hnU,t + hnv)K{—u)K{—v)dudv 

rrii I I 



1 



rriikn 



2n / ,,. s ^.,S — t ,, , -^/ , _^ I ,,, ^ ^^,S — t 



{C{t) + a') / K{-u)K{— u)du + hnC {t) / uK{-u)K{— u)du + 0{hi) 



+ (l - — j C{s,t) I I K{-u)K{-v)dudv 

+ U-^j/i„ /" f[Cs{s,t)u + Ct{s,t)v]K{-u)K{-v)dudv + 0{hl), (47) 

where the last step is by Taylor series expansions. Now, noticing that K is symmetric about 0, 
f K{x)dx = 1 and / xK{x)dx = 0, (5) and (6) follow from (47) after simplifications. 
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Asymptotic pointwise bias (31) 

We first compute the expected value of the estimate described by (11). For simplicity of notations, 
we express Xi{si) + {s — si)X'-{si) by Xij{s). Observe that 



1 ™' 1 



i=i 



^.,si — Tij ^ s — si^/.S] — Tij . 
K{ ^ ) + ^—^K'{- ^ 



h 



h 



h 



Let the support of kernel K[-) be denoted by [—Bk, Bk\- Then, for each fixed j = 1, . . . , mj, and 
i = l,...,n. 



E 



h 



:C(n,n)+aVW 



^-^{j^irfi-^ir- 



yf, ( (i^(^^^) + '^-^K\^^^)\^C ^'' 



h 



h 



si' -Tij s - si' j^ifSii - Tij 
" +^^K{ -^)) 



h 



^^ + l^/,'(£t^))(A-(f^) + iZ* A'(f!l^)) , ,„, 



h 



h 



h 



h 



h 



h 



(48) 



which is if js; — s;'! > IB^h, since this implies that K{^^j^)K(-^ — ) = for all u G M. If |s; — s;'| < 
IBj^h, there is nonzero contribution of the term (48) in K[Xi^i{s)Xi^ii{t)]Qf^{s — si)Qf^{t — sp) only 
if \s -t\ < 2{Bk + CQ)h, where supp(Q) = [-Cq,Cq]. Thus, if A > A{Bk + Cq), then for 
|s — i| > 2^h, we have 



w{mi)E{X,^l{s)Xi^l,{t)) 
C{u,v)g{u)g{v)j^ 



(Ki'-i^) + '-^^K'C-i^)){KC-^) + '-^^K'CJ^^)) 



dudv 



/ / C{si + xh, sv + yh)g{si + xh)g{sv + yh) 

{K{x) + t^K'{-x)){K{y) + tl^K'{-y)) 



dxdy. 



(49) 



We assume that the conditions in Section 4 hold. Then using the representation (49), and the 
calculations done in Appendix F, we get an expression for the asymptotic bias in estimating C{-, •) 
as a function of the bandwidth h = hn- These results are summarized in the following lemmas, 
where Cg, Cgs and Ct, Cu denote the first and second partial derivatives of C{s,t) with respect to 
s and t, respectively. 

Lemma 7.5. (Expectation of C{s,t)): Let K2 = J x'^K{x)dx, 

Qhis) = ^Qf,{s - si), and Q^ (s) = ^(i^)2Q,(, _ ,,). 
1=1 i=i 
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Then, for \s — t\ > 2Ahn, 
EC{s,t) = C{s,t)Qh{s)Qh{t) 



2 



+ \c{s,t) 



^{K2Qh{s) - Qf^(5))Q/.(t) + ^{K^Qhit) - Qf (t))Q,(s) 
g{s) " g{t) 



+h'Cs^-^{K2Qh{s) - Qf (s))Q,,(t) + h^Ct^{K2Qh{t) - Q^ )(t))Q,,(s) 
g{s) '' g{t) 

+Y^(^) [Css{K2Qh{s) - Q^^\s))QUt)+Cu{K2Qh{t) - Q^^\t))QUs)_ 
+0(/i2+"). (50) 

Note that because of property (iii) of the kernel Q, and the fact that si = {I + a)h for / = 
1, . . . , L„,, for some constant a G [—3, 3], we have for s G (c, 1 — c), for some c G (0, 1), 

1=1 

Therefore, we can choose Ln and the sequence of points {si}i^^ so that L„ ^ h~^, and Qh{s) = 1 
for all s G [0, 1]. That is, from Lemma 7.5, we have KC{s,t) = C{s,t) + O^h?). 

Lemma 7.6. (Expectation of C^:{t)): Let C (t) and C (t) denote the first and second derivative 
of the function C{t) := C{t,t). Then, uniformly in t, 

^C.it) = (C(t)+a2)Q,(t) + y(C(t)+a2)(^)(K2Qh(t)-Qf(i)) 

+h''c\t) (^) {K,Ct^{t) - Ctf{t)) + y ^(i^2Q.(t) - QfW) 
+0(/i2+"). (51) 

Proof of Lemma 7.6 follows along the lines of Lemma 7.5. Furthermore, if an estimator a^ is 
such that Ect^ = o"^ + OQi^), then it follows from Lemma 7.6 that the estimator C{t) := C^,{t) — a^ 
satisfies ^ 

Ed(t) = C7(i) + 0(/i2), (52) 

uniformly on t G [0,1], since Qfc(i) = 1 on t G [0,1]. Next, since C{s,t) = C{t,s) and C(-,-) is 
smooth, it follows that Cg — Ct = 0. Consequently, using a Taylor series expansion, it follows that, 
for any yl > 0, 

C{s,t) = c(^-^\+0{h^), ioT \s-t\<-h. (53) 

Combining (52) and (53) we get, 

^^y-^\=C{s,t) + 0{h^), for |s-t| < l^/i, .s,tG [0,1]. (54) 
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Appendix D 

Proof of Proposition 5.1 

We shall extensively use the following representation 

Hu{x, y) = Hu{x, y) - t-(5(x - y), 

A,, 



(55) 



where H^{x,y) 



Y] T ^'4'k{x)'il^k{y) + -r-ip,y{x)ip,y{y). 



l<k^u<AI 

The first step is to express Cc{s,t) as Cc{s,t) — Wr (s,i)(a^ — o"^), where 



s + t, 2^ 



C,is,t) = W~^ is,t)Cis,t) + W~^ {s,t){C,{^-) - a') 



(56) 



Therefore, in order to separate the effect of estimating o"^, use the fact that for any fixed e > 0, 



H.CciJuM < {l + e)\\H,C,iP,\\i + {l + -\{a'-a'y\\H,Wrt 



1 






(1 + e) II H.Cc^P, 11^ + 1 + _ (5^ _ cjyO{hl). 



(57) 



The equality follows since using Hyipy = 0, the definition of Wj^ , and the Mean Value Theorem, 
we have 



\{H,Wri^,){x)\ 



< 



Hu{x,s) 

^^ II ^' I 

o II Th> I 



(s-^)Al 



(s+^)VO 



(^i,(t) - il)y{s))dtds 



\Hy{x,s)\ds + 



Xu 



Since E(a^ — o"^)^ = o(l), it is enough to show that E || H^Ccipu 111 has the bound given by the 
RHS of (40), without the multiplicative factor (1 + e). With a slight abuse of notation, we write 
C{s) to indicate C*(s) — o"^. Then, since 



{H,Cci^,){x) 



H^{x,s)Wr is,t)Cis,t)Mt)dsdt+ I I H^ix,s)Wr {s,t)Ci^)Mt)dsdt, 



it follows that, || H^Cctpu Hi equals 



+ 



+ 



H^{x,si)H^{x,S2)Wr {si,ti)Wr (s2,t2) 

■ C{si,ti)C{s2,t2)'4'u{ti)'ipuit2)dsids2dtidt2dx 

I I I I I ^'^^' '^^^''^''' S2)W~^Jsi,h)W~^^ {S2,t2) 

■ C{ — - — )C( — - — )ipuih)ipuit2)dsids2dtidt2dx 

H^{x,Si)Hi,{x,S2)Wr {si,ti)Wr is2,t2) 

C{si,ti)C{ — - — )ipu{ti)tpuit2)dsids2dtidt2dx. 




(58) 
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Thus, in order to obtain E || HyCc'4'v 111) we need to evaluate the quantities '&[C{si,t\)C{s2,t2)], 
E[d(^)d(^)], andE[C(si,ti0(^)]. 
Let 



u,{s,t)= Y. — Y1 yijYij'Ks,imj)Ks,imj')Qhis - si)Qt,{t - si,), 






(59) 



where Ks^i{-) is as in (44). Then we can express the expectation of the first term on the RHS of 
(58) as 

■ [9{si)g{s2)g{ti)g{t2)]~^E[Ui{si,ti)Ui{s2, t2)]'ipy{ti)%l)u{t2)dsids2dtidt2dx 
H — 2 X] w{mi^)w{ini^) I I I I I H^{x, si)H^{x, S2)W-f^^{si,ti)Wj^^{s2M) 

■ [g{si)g{s2)giti)g{t2)]~^E[Uij^{si,ti)Ui2{s2,t2)]tpu{h)ipu{t2)dsids2dtidt2dx. (60) 

The following proposition is the key to get a simplified bound on (60). It is proved using a lengthy, 
but fairly straightforward calculation. The details are given in Appendix F. 

Proposition 7.1. Suppose that A > ^{Bk + Cq). Then for \sk — tk\ > 2^^n (k = 1, 2), we have 

1 ^ 2, MU^{SlM)Ui{s2,t2)] 
—^yW (mi) ; ; — ; ^ — ; r — ; — 

"'^ 9isi)gis2)giti)g{t2) 

_ 2^^^ (m, - 2)(mi -3) 



n ^ mi{mi - 1) 



[{C{suti) + 0{h^))iC{s2,t2)+0{hl)) 



HC{si,S2) + 0{hl)){Citut2) + OihD) + {C{s,,t2) + 0{hl)){C{s2, ti) + Oihl))] 
+Z1 + Z2 + Z3 + Z4 + Z5 + Z6, (61) 

where the quantities Zj := Zj{si,S2,ti,t2), j = 1, . . . , 6 where Zi, . . . , Z4 are asymptotically equiv- 
alent to Z{si,S2), Z{si,t2), Z{ti,S2) and Z(ti,t2), respectively; and Z^,Zq are asymptotically 
equivalent to Z{si,S2,ti,t2) and Z{si,t2,ti,S2), respectively, where 

(0(^^) if\s-t\<^ 
I otherwise; 



and 



Z{si,S2,ti,t2) = < 



'0{^U^) ^/max{|si-S2|,|ti-t2|}<^ 

0{^) if\s^-S2\<^-and\t,-t2\>^ 

0{^) ^f \si -S2\>^ and |ti -t2\<^ 

otherwise. 
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Also, 

— r- > W(mi,]w(mi„) r ; r -. 7 ; — 

n — 1 



n^ H". 9{si)g{s2)g{ti)g{t2 



n 



-{Cisi,ti)+Oihi))iCis2,t2)+0{hi)) 



1 "" 

+;^( E /'M2) [icisi,s2) + oihl)){cih,t2) + o{hl)) 

+(C(5i,t2) + 0(/i^))(C7(s2,ti) + 0(/i^))]. (62) 

In all of above the O(-) terms are uniform in si,S2,ti,t2 in their respective domains. 
Now we deal with the last two terms on the RHS of (58). Let 

^^(«) = E-E^^KK7^..)- (63) 



rrij 
1=1 j=i 



Then, 

- 1 '^ _ 

a{s) = -Y,[9is)]-'V^{s)Qh„{s-Sl). 



n . 
i=\ 



For convenience, in the rest of this subsection we shall use z^ to denote {sk+tk)/2, for A; = 1, 2. Then 
the following proposition describes the contribution of the quantities of the type E[yj^(zi)Vi2(z2)] 
andE[C/i,(si,ti)y,2(z2)]. 

Proposition 7.2. Suppose that A > A{Bk + Cq). Then for (i) \sk - tk\ < ^, k = 1,2, 

" T^ 9{zi)g{z2) n^ .^. g{zi)g{z2) 

= C{si,ti)C{s2,t2)- f^V— )(C(si,ti) + a2)(C(s2,t2)+a') + 0(/i^) 
I 1 " 1 1 '^ \ 

+ -(1--E— ) + -E p1^2 {C{si.S2)C{tut2) + Cisi,t2)Cis2,ti) + Oihn)) 
\ «=1 J17^«2 / 

+Zj, (64) 

where Z-j := Z-j{zi,Z2) is asymptotically equivalent to Z[zi,Z2)- Next, if (ii) \si — ti\ > —^ and 
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|s2 - t2\ < ^, then 



2 
n 



— Y,w{m,)E{Ui{si,ti)Vi{z2)) + — ^ w{m,,)E{Ui,{si,ti)V^,{z2)) - a^EC{si,ti) 

2 = 1 ^17^22 

= (C(si, ti) + 0{hl)){C{s2, t2) + 0{hl)) 

- ( A E — ) (Cisuh) + 0{hl))iC{s2,t2) +a' + 0{hl)) 

/i 1 " 2 1 "" \ 

+Zs + Zg, (65) 

where the OQi^) terms within brackets in the first term on the RHS depend on (si,ti) and (521^2) 
respectively, and Zj := Zj{si,ti, Z2), j = 8,9 satisfy 



OiTjhl) ^f\si-S2\< 2 



^ Ah„ 

I otherwise; 

Z^ ^ |0(;i4l) ^f\tl-S2\<^ 
I otherwise. 

The proof of Proposition 5.1 is now completed by using the definitions of E[C(si, ti)C(s2, ^2)]) 

E[d{^^)d{^^)], and E[C{si,ti)d{^^)]; using the properties of the kernel H^{x,y); and the 
bounds in Propositions 7.1 and 7.2 and plugging everything back into the expectation of (58). The 
details can be found in Appendix F. 

Appendix E 

Asymptotic pointwise variance (32) 

In this section, we prove (32), (33) and (34). Most of the derivations are similar to that of Propo- 
sition 5.1. Thus we simply give a brief outline. 

First, using the fact that Wr (s,t)Wr (s,t) = 0, we obtain 

Var(Ce(s,t)) = Wr (s,t)Var(C(s,t)) + W^r (s,t)Var(a(^)-?') 

,S + t,^ , ,, ,_2n 



Var(C,(^-)) + Var(a^) 



< W~^ is,t)VariCis,t)) + 2W~^ {s,t) 

Since E(a'^ — u^)^ has the rate given by (33) (Corollary 4.1), we only need to provide bounds for 
Wj^Js,t)Yav{C{s,7 

Proposition 7.3. 



Wr (s,tWar(C(s,t)) and Wr (s, t)Var(C=i.(^)). We state these in the following propositions. 



-] + \ ^ y^ Phi-z 0(l)+0(max{ ,„ o ,^— 
nj \n^.^ ^^j V nhlrrj^ nhnl 



W,^S'.t)Var(C[s,t)) = O ( - + ^ ^ ,^,, 0(1) + O l^^^^^^i -—•, ) , (66) 
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Proposition 7.4. 

W^J,,i)VariC.it±i)) = O (i) + (^ E pA 0(1) +0 (^) . (67) 

The proof of (34) is finished by combining Propositions 7.3 and 7.4 and Corollary 4.1. 

Proof of Corollary 4.1 

First observe that, 

E(a2-a2)2 = ^^^ _^^^^^ ^"^' ^"^' E[(a(t) - a^ - ^o(t))(a(g) - a^ - t^{s))\dsdt 

< sup E(C'*(i) — cr^ — Co(t)) (by Cauchy-Schwarz inequality) 

te[To,Ti] 

< 2 sup Var(C=,(t)) + 2 sup Var(do(t)) 

t6[To,ri] te[To,Ti] 

+ sup (E(a(t))-a2-E(^o(t)))^ (68) 

te[To,Ti] ^ ^ 

By Propositions 7.3 and 7.4, and the definition (12) of Cq, the sum of the first two term on the 
RHS on (68) is bounded by 






On the other hand, since for any bounded u G [^41,^2]) 



-{C{t - hnU, t + hnU) + C{t + hnU, t - hnU)) - C{t, t) 



uniformly in t G [To,Ti], it follows from Lemmas 7.5 and 7.6 (Appendix C) that the last term on 
the RHS of (68) is 0{h^). 

Proof of Proposition 5.2 

Without loss of generality we assume g to be uniform density on [0, 1]. We need to consider two 
cases separately : (i) |s — t| > ^ and (ii) |s — t| < ^■ 

(i) |s-t|>^: In this case, we have Cc{s,t)-E[dcis,t)]=WAhis,t){C{s,t)-E[C{s,t)]). Let 

Bi{s,Tij) = '^Ks,i{Tij)Qh{s - si), 1 < j < mi, 1 < i < n. 
1=1 

Since \Ks^i{Tij)\ = 0{h~^) and the summands are nonzero for finitely many /, there exists a 
constant C3 > such that 

sup max max \Bi{s,Tij)\ < C^h'^ . (69) 

31 



Note further that Bi{s,Tij) = if |s - Tij\ > 2{Bk + CQ)h. Next, 
1=1 * j=i 

M / 1 '"^ \ 1 '"^ 

= X] V^kiik — ^ '4^k{Tij)Bi{s, Tij) + a — ^ eijBi{s, Tij) 

fc=l \ J=l / J=l 

M ^ mi 

= y^ V>^CikBii^k{s) + o" — y] eijBi{s, Tij), 
k=\ i=i 

where B^i^kis) ■= 7^ YJj'=i'^k{Tij)Bi{s,Tij). By (69), there exists C4 > such that 

sup max max \Biik{s)\ <C4h~^. (70) 

se[0,l] l<fe<A/l<i<n ' ' ' " 

Also, since A > 4(i?x + CQ)h and since |s — t| > — , it fohows that Bi{s,Tij)Bi{t,Tij) = 0. 
Moreover, Bi{s,Tij)Bi{t,Tiji) 7^ only if 1 < j 7^ j' < rrii are such that |,s — Tij\ < 2{Bx + 
CQ)h and |t — Tjj/| < 2[Bk + CQ)h. This implies that 

Pg (Bi,,fc(s)Si,,fc(t) / 0) < C75m,(mi - l)/i2 for some Cg := C75(^) > 0. 

Furthermore, for each A; = 1, . . . ,M, {Bii^k{s)}^=i are independent, and these random vari- 
ables are independent of {^ik : 1 < A; < M}"^]^ and {sij : 1 < j < mi}f^i. Then, we can 
express C{s,t) — E[C(s,t)] as, 

C{s,t)-E[C{s,t)] 

I " 

= ^ \/%Ak'-^^ikiik'iw{rni)Bii^k{s)Bu^k'{t) 

l<k^k'<M i=l 



+ IZ ^fc- IZ(^fci - l)w^K)^l*,fc(s)5H,fc(i) 
fc=l i=l 

+ 5^ Afc- ^ u;(mi)(Si,,fc(s)-Bii,fc(t) - ¥.{Bu,k{s)Bu,k{t))) 

k=l i=l 

M n -. rrii 

+cr^V>^k-^w{mi)^^iikeij{Bii^k{s)Bi{t,Tij) + Bu^k{t)Bi{s,Tij)) 



n '■ — ' nii 

k=\ i=\ j=i 



-, n / \ rrii 

+^^~ Z] ^;;2^ Z] ^ij^ij'Bris, Tij)B,{t, Tij.) 

+^^-E^E(4 - i)A(^,T,,)i?.(t,r,,) 

i=\ ' j=l 

-. n / N rrii 

+(T^-^^^^^^{Biis,Tij)B,{t,Tij)-EiB,is,Tij)B,{t,T,,))). (71) 

'^ j=i "^j j=i 



32 



The last two terms in the above expression vanish since |s — i| > 4:{Bk + CQ)h. Note 
that, maxi<j<„i«(mj) is bounded. By (70), \Bii^i^{s)Bii^i^{t)\ < C'^h''^ are bounded for k = 
1, . . . , M, and for all k, k' , 

max Yai{Bii k{s)Bii k'{t)) < Cq max{(mM~^ (zZi^)"^} for Cq = Ce{A) > 0, (72) 

l<i<n ' ' 

(see Appendix F). Thus by Bernstein's inequality, and using the condition that m^ = o{nh? / log n), 
given ?? > 0, there exists ci^ri > such that for sufficiently large n (so that the bound in (72) 
is 0((m/i)-2)), 



max 

fc=l,...,Af 



1 " 

-y^w{m,){Bu,k{s)Bu,k{t) - E{Bu,k{s)Bu,k{i))) 



n . 
1=1 



^^ / log re \ ^^„^ 
' y nh?m? I ~ 



(73) 
Next, let A be the set of indices i such that i?ij^fc(s)-Bij,A:'(0 7^ fo^' some k,k' . And let 
Nn = \A\. Since for any fc,A;', P{Bii^k{s)Bii^k'{t) / 0) < C^m^h'^, it follows by another 
application of Bernstein's inequality that there exists a set Dn (in the sigma field generated 
by {Tij}) and a constant C2,.;y > such that 

Dn = {Nn < C2,rjnfrfh'^} and P(D„) > 1 - re"''. 

Therefore we can restrict our attention to the set Dn, and conditioning on T We can express 
^A,k = {Cik)i&A as ^_4_fc := (R^^)^/2|^^fc, where the random vectors |_4fc have Nn„{0,I) 
distribution and are independent for different /s's. Then we can write (conditionally on T) 



J2^ki^k'iw{mi)Bii^k{s)Bii^k'{t) = ^Afc^(T)^Afc'' 



where $(T) = (RaaY^'^ diag{w{mi)Bii^k{s)Bii^k'{t))i&A{^AAY''^ ■ Observe that by (70) and 
condition C3, we have || $(T) ||< C4Knh~'^. Therefore, by an application of Lemma 7.2, we 
have, for some cs^^ > 0, 



1 /I 

-'^Cki&iw{mi)Bii^k{s)Bii^k'{t)\ > C3,r,mK„W— -2-,-Dn) < n~''. 



■ n ^ — ' " " ' ' V re/i^ 

j=i 

Very similar arguments can be used to obtain bounds of order ^«^n\/-^f? (that hold with 
probability at least 1 — 0{n~^), for any given rj > 0) for the second, fourth and fifth terms 
on the RHS of (71). Thus, by conditions on Kn and hn, we have, for some constant 04,^^ > 0, 



^ ^ /loffre 

n\WAhis,t)iCcis,t) -EiCc{s,t))\ > CA,rfnKn^j ^) < re"^. (74) 

(ii) \s-t\ < ^: In this case, we have Cc{s,t) - E[dc{s,t)] = WAh{s,t){C^{^) - E[C,(^)]) 
(ignoring the maximum over /i^ in the definition). Then similar (but somewhat simpler) 
arguments, now involving Lemma 7.3, show that for some cs^^ > 0, 



F{\WAh{s,t){C4'-^) -E[C,C-±^)])\ > c,,^mKn^f^) < re-^. (75) 

Combining (74) and (75) we obtain the result. 
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Appendix F 

Details of computation of G^{-) 

We want to give explicit functional form for G^{y), for j = 0, 1 and for any y G M. Let 

Bi{x) = x^/6 

B2{x) = (-3x^ + 3x2 + 3x + l)/6 

B^{x) = (3x^ - Gx^ + 4)/6 

B^{x) = (1 - xf/Q. 

Then the centered version of the cubic i?-spline Q has the form 



Q{x) 



' Bi{x + 2) for -2 < x< -1 

B2{x + l) for - 1 < X < 

B'i{x) for < X < 1 

Bi{x - 1) for 1 < X < 2 

otherwise 



for - 2 < X < -1 



\{-2,x^ -6x^ + 4) for - 1 < X < 
^(3x^-6x2 + 4) for < X < 1 



\{2-xf 




for 1 < X < 2 

otherwise. 



Note that Gj{y) can then be computed by utilizing the fact that, for j = 0, 1, 



Gfiy) 



(y+f)A2 



x^Q{x)dx 



(?/-4)A2 



x^Q{x)dx, 



where the integrals on the right hand side are defined to be zero if the corresponding upper limits are 
less than —2. The integrals on the RHS of above equation can be computed from the representation 
of Q(-) as follows: 



Q{x)dx 



Q{x)dx 



— (2 + 5)'*, for -2 < 6< -1 

— (-36'' - 86^ + 166 + 11), for - 1 < 6 < 



i: 



Q{x)dx 



Q{x)dx 



1 

24 
1 
24 



(36^ - 86^ + 166), for < 6 < 1 
(1 - (2 - 6)^), for 1 < 6 < 2, 



/ xQ{x)da 



— (2 + bf -—(2 + 6)^ for - 2 < 6 < -1 
30^ ^12^ ^ ' - - 



xQ{x)dx 



xQ{x)dx 



xQ{x)dx 



-66^ - 156^ + 206^ - 11), for - 1 < 6 < 



1 
60' 

— (66^-156'' + 2062), forO<6<l 

J_(2-6)5- J-(2-6)^ + — , for 1< 6 < 2. 
30^ ^12^ ' 20' - - 
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Details of the calculation of pointwise bias 

Performing a Taylor series expansion around {s,t) we get, 

g{si + xh) = g(s) + hC-i^ + x)g'is) + ^C-^ + xfg"is) 



0((l 



f ^|2+a I |„|2+Q\L2+a 



h 



+ xr+" /i" 



9{s,+yh) = g{t) + h{^+y)g'{t) + ^C\^+yfg"{t) 



(76) 



and 



C{si + xh,si' +yh) = C{s,t) + h{ — h x, — h y) 

h h 



Cs{s,t) 
Ct{s,t) 



h si — s si' — t 



^ 2 ' h 



+ X, 



h 



+ y) 



Cts Ctt 



+ X 



t—S,l , 



o 



/i2+" ) . (77) 



£ f[|2+a I 1^ ^\2+a . |™|2+q , L,|2+a 

First we consider the off-diagonal terms, i.e., compute EC(s,t), for |s — t| > 2Ah. 
• h^ terms : Since J K{x)dx = 1 and J K'{x)dx = 0, 

C{s,t){K{x) + l^K'{-x)){K{y) + ll^K' {-y))dxdy = C{s,t). (78) 



h^ terms : Since J xK'{—x)dx = 1, J xK{x)dx = 0, and J K{x)dx = 1, 

h 



.—f^ + x)Cs + (— ^ + y)Ct 



{K{x) + '—^K\-x)){K{y) + ^-^K'{-y))dxdy = 0, (79) 

h h 



and 



hC{s,t) 



g{s)g'{t)C-^ +y)+ g'{s)g{t)C-^ + x) 



.{K{x) + '—^K\-x)){K{y) + ^—^K'{-y))dxdy = 0. (80) 

n n 

h? terms : Since J x'^K'{—x)dx = 0, J xK'{—x)dx = 1, J xK{x)dx = 0, and f K{x)dx = 1, 






9"{t)g{s){^ + yf + g"{s)g{t)C-L-l + ,f 



{K{x) + '-^K'{-x)){K{y) + t-^K\-y))dxdy 



-Cis,t) 



g"{t)g{s){K, - {^f)+g"{s)g{t){K, - {'-i-^f) 
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,si- s 



Si' -t 



h'C{s,t) [{^-i+x){^— + y)g'{s)g'{t)] 



h 



h 



{K{x) + '—^K'{-x)){K{y) + *-^K'{-y))dxdy = 0; 



C-i_l + ^)Cs + C-\^ + y)Q 
9{s)9'{tK^ + y) + g'{s)g{t)C-^ + x) 
■ {K{x) + t^K\-x))[K{y) + l^K'{-y))dxdy 
Csg'{s)g{t){K2 - (^^f) + Ctg{s)g' {t){K2 - i'-^f) 



2 



2 



(^ + xfCs, + 2{^ + x){^ + y)C., + (^ + y)2a, 
(K(x) + '-^K'{-x)){K{y) + i^K'(-y))<ixdy 



Css{K2-{^^f)+Cu{K2 



:^f) 



In summary, the h?' term in the expansion is, 



h' T:9"{s)g{t)C + g'{s)g{t)Cs + -C.. (i^2 



.•5; -S^2^ 



+ /i^ ( ^g(s)/(t)C + g{s)g'it)Ct + ^C** ) {K^ - {'-^f). 



(81) 



Proof of Lemma 7.5 : Combining (78), (79), (80) and (81), and using (76), (77) and the fact 
that X];=i \^jf^\^Qh{s — si) < oo, after some algebra, we obtain (50). 



Combined bound on E || HyCcil^y Hi 

We put the different pieces derived in Appendix D together to obtain a bound on E || H^Ccipu Hi- 
For ease of notation, we denote by 7iu = 'Huix,si,S2,ti,t2) the integral operator with kernel 
H^{x, si)H^{x, S2)'(pu{ti)ipu{t2)- Then, with ri, r2 taking values or 1, 

nu{x,si,S2,ti,t2)iC{si,ti)Y'{C{s2,t2)Y^dsids2dtidt2dx = 0. (82) 

nu{x,si,S2,ti,t2){C{si,t2)y''{C{s2,ti)Y^dsids2dtidt2dx = 0. (83) 





x:? 



Huix, si,S2,ti,t2){C{si,S2)Y^ {C{ti,t2)Y'^dsids2dtidt2dx 

ri 



E 



l<k^u<M 



(Afc - XuY 



(84) 
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Implicitly using (130) - (132), we also have the bound 

\ Hu{x,si,S2,ti,t2)R{si,S2,ti,t2)dsids2dtidt2dx\ =0{\\ R\\oo)- (85) 

From Proposition 7.1, the total contribution in (60) of the first terms on the RHS of (61) and 
(62) becomes 



fl ly. 4m, -6 

In n ^ vtiiimi — 1) 



n — \ 

) + 



n 

2 

2 



H^{x,s)Wr {s,t)[C{s,t) + 0{hf,)]i;^{t)dsdt 



dx 



/ 1 , 1 ^^ 4777,2 ~ 6 , 1 ^^ 2 

\n n^ mAnii-l) n? Z^ ^*i'2 

■[[[[[ H,{x,Si)H,{x, S2)W J Jsi,h)W~^Js2,t2) 

{C{si,S2) + 0{hl)){C{ti,t2) + 0{hl))^y{ti)^y{t2)dsids2dtidt2dx 

( 

/ 1 , 1 v-^ 4rTT,j — 6 , 1 v-^ 9 

+ - 1 - - y —-^ — - + ^ y p\i 



^ •miirrii 



n^i2 



■[[[[[ H,{x,Si)H,{x,S2)W~^Jsi,ti)W~^Js2,t2) 

{C{siM) + 0{hl)){C{s2,ti) + 0{hl))^y{ti)^y{t2)dsids2dtidt2dx. 

Since H^Cipu = 0, it can be checked that the first integral in (86) is 0{h'^). On the other hand, 
from the definition of Wr {s, t) and the fact that HyCipy = 0, it follows that the last integral term 
is 0{hn)- 

Next, apply Tiu to the following functions : W^ (si,ti)VF^ is2,t2)D2{si, S2,ti,t2) and 
2W-j^^{si,ti)W-j:^^{s2,t2)D3{si,S2,ti,t2), where D2{si,S2,ti,t2) and -D3(si, ^2, ti, ^2) are the terms 
given by the sum of the first three terms on the RHS of (64) (including the isolated 0{h'^) term), 
and the sum of the first three terms on the RHS of (65), respectively. Then, adding these terms 
to (86), we have, by (82) - (85), (132) (for dealing with the isolated 0{h^) term in (64)), and the 
comment following (86), that this sum equals 



n^«2 / 



" li<.fe„ (^^ - ^")V V\U, li<.feM (^' - ^-'^ 



Next, for notational convenience, express the integral operator Tip applied to Zj (where Zj are as 
in Propositions 7.1 - 7.2) times TF^Jsi, ti)W^^(s2, t2)W^,„(*i'*2) by HyW^'^^W'^'^^W'^'^^ Zj, etc. 
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Using (130) - (135), and the bounds in Proposition 7.1 for Zj, j = 1, ... ,4, we have, 

i?2 := W^W'*'W'*'Zi = HjW'''^W^'*^W'^''^Zi = O ( ^^ I , 

\nhr,m I 

i?3 := njw'''^^w'^'^^ Z2 = nJv'''^'W'^'^'w'^'*^Z2 = o (—] , 

\nm I 

R^ ■= njv'^'^'w'^'^^ Z3 = nJv'''^'w'^'^^w'^'^'Zs = o (—] , 

\ nm I 

\nm I 
Using analogous reasoning, from Propositions 7.1 and 7.2 we also have 

Rq := W^W'*W'*'Z5 = O (—^ „ , , 

\ nhnZnf' J 

Rj := njW'^'W'^'*^ Zq = O (^ 

\nrn/ 

Rs := HuW^'^'W'^'^' Zr = O ( —] , 

\nm I 

Rg ■= n^v'^'^'w^'^^Zs = o ( — ^ I , 

\nhnm J 

Rio := nJW'''^'W'^'^^Zg = o(—]. 

\nrnj 

Hence, combining (87) with the bounds for R2 to Rio, using the definitions of E[C(si,fi)C(s2, ^2)]) 
E[d(^i±*i)^(^2±t2 )]^ and K[C{si,ti)d{^^)], and plugging everything back into (58), we complete 
the proof of Proposition 5.1. The details of the key steps in this derivation are given below. 

Proof details for Proposition 5.1 

Proof of Proposition 7.1 : We need to deal with terms of the form 

-^Ji«2;iii(j2i2('^i' *!' '*2i ^2; h, h^h, h) 

for 1 < ji,j'i < rriij^, 1 < J2)i2 — '"^J2' ^ ^ H,'i'2 ^ n. For computational convenience, we also 
define, 

-^ni2;jii(i2i^('5i' *i' -52, ^2; hJi,h, h) 

= -^iii2;iij(J2j^(«l'*l'^2,i2;^l,^'l,^2,^2)Q/i(si - Sh)Qh(tl - Sl'JQhis2 " •S«2)Q/i(^2 -St 
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First, consider the case h = 12 = i, say. Then, using to • to denote E^i-j^ji j^ji (si, ti, S2, ^2; ^i, ^i, ^2, ^2)5 
we have 

¥.[Ui{si,ti)U^{s2,t2)] 



-I '"'1 -^71 -^n 



ii7^il7^j2^i^«i/i=i«2,/^=i 



+ 



4 



E + E + E 

.h=i'i¥'h¥'i'2 h=h¥'i'i¥'f2 Ji=J2^Ji¥=J2 



rui 



rrii 



mi 



+ E + E + E 

Jl¥=j'l¥=J2=J2 Jl¥'j'l=J2T^J2 Jl¥'J2T^Ji=J2. 



+ - 



mj 



E + E + E 

Jl=j'l¥=J2=J2 Jl=J2¥=j'i=j'2 h=3'2^il=J2, 



E E* 

ll,l[=ll2,li, = l 
Lin L/n 

E E* 

ll,l[ = ll2,l'2 = l 



+ 



m; 



E + E + E + E 

Ji=j'i=J2¥=J2 ii=i;=i2^i2 j\=i2=J2¥=j[ ii/il=i2=j2. 

-yyj^: ^/_^ /_^ /_^ 



Li-n Lin 

E E* 

h,l[=ll2,l2 = l 



mj 



(89) 



ii=ji=i2=i^h,/i=U2,/^=i 
Next, consider the case ii / ^2- Then, with * denoting E^^^^.^^ji j^^i (si, ii, 52, ^2; ^1, ^1, ^2> ^2)' 

E[C/,,(si,ti)[/,,(s2,t2)] 

mi, mi„ r r 

fn^ TR^ ^-~^ ^-~^ ^-~^ ^^ 



+ 



m\m\ 



jl=j{J2¥=J2 jlT^j'l J2=J2, 



Lji L,n 



rrii-, rrii^ t t 



E E* + ;;AtE E E E *•(=>») 



«i,/i=H2,;2=i 



771 ■ 777- ■ 

*i '^ h=i[h=r2hA=^h,l'2=^ 



Note that, for ah ii, ^21 if either ji = j'^ or j2 = J2, then 



/ , / ^ -£'iii2;jii(J2j2^'^l'*l'*25*2;^l,^l,^2,^2) = 0; 
^1,^1 = 1 l2,l'2 = ^ 

unless |si — ti| < -^h, or |s2 — ^2! < 2"^' respectively, for A satisfying A > A{Bk + Cq) and 

/in = Ahn- This can be verified by using the definition of -E'i^J2;iii'i2j' (-^I'^i' '^2)^2; ^ii ^i, ^2> '2)' 
equations (113), (114), (116) - (119), and arguing as in the analysis of the term (48). Therefore, 
since li^ _j |< A^Vl^^ (s^, tk) = 0, for k = 1,2, the sums corresponding to either ji = j( or J2 = j'2 in 

(89) and (90) do not contribute anything to (60). Thus, when ii ^ 12, the only sum that contributes 
to (60) corresponds to ji 7^ j'^, J2 7^ J2- When ii = ^2 = i, the sums that contribute to (60) are the 
ones corresponding to ji / j[ / J2 / J2, j'l = i2 / j'l + J2, ji = J2 / ii / J2' h + fi = h + i2> 
ii / i2 / ii = i2, ii = i2 / il = i2, and ji = j^ / j^ = j2. We consider these cases one by one. 
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Lemma 7.7. // h = i2, ji + j'l / J2 / J2; or h / ^2 ji / j'l, j2 / J2; ^/^en 

= (C(si, ti) + 0(/i2))(C(s2, t2) + 0{h^)) 

+pU, [iC{si,S2) + 0{h')){C{ti,t2) + Oih")) + (C(si, t2) + 0{h')){C{s2, h) + 0{h'))\m) 

where the 0{h'^) terms are uniform in si,ti,S2,t2 G [0,1]. 

The following lemma gives an expression and the corresponding bound for the term Zi . 
Lemma 7.8. // ii = i2 = i, ji = J2 / j( / j'2, then 

1 ■^ 1 ^ V^ V^ Eii.j^j>^j^j>^{si,ti,S2,t2;h,l'i,h,l2) 
^^"^. ^■,^■nr^,r^ 9{si)g{ti)g{s2)g{t2) 

I otherwise 

The following lemma gives expressions and the corresponding bounds for the term Z2, Zj, and 
Z4. 

Lemma 7.9. // ii =i2 = i and ji = j'^ / j[ / J2; ii / Jl = J2 / J2; ii / J2 / ii = J2^ then 
1 ^ 1 ^ -^ ^ Eii.j_^j'j^j^{si,ti,S2,t2;li,li, 12,12) 



1 otherwise 

1 ■^ 1 Y^ Y^ Y^ Eii.j^jij2j^{si,ti,S2,t2;li,l'i,l2J2) 
"" fe "^ .i^.fe^.^ /.Ill ^ii 9i^^Mh)s{s2)9{t2) 

I otherwise 

1^1 ^ Y^ Y^ Eii.j^jij^ji^{si,ti,S2,t2;h,li,h,l2) 

^'k^t ,,^h^,^ ,iii ,iii ^(^i)^(^i)^(^2)^(*2) 

\0{^) ^f\t^-t2\<^^ 



(93) 



(94) 



^n/im^ "J ^"^ "^1 — 2 

otherwise 



(95) 
The following lemma gives expressions and the corresponding bounds for the terms Z5 and Zg. 
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Lemma 7.10. If ii = i2 = i and ji = j2 / j'l = J2, ji = J2 / J'l = h, then 



J- ^^ X ^A ■^ Y^ l^ii;jij[J2J'2i^i'tl,S2,t2;h,li, 12,12) 



1 A 1 






9{si)g{ti)g{s2)g{t2) 



if max{|si - S2I, |ti - t2|} < 4^ 



Q( 1 'I 

"^ nhrri'^ ' 

0{^) ^f \si-S2\>f and \h -t2\<f 
otherwise: 



if \si - S2I < ^ and \ti - ^2! > ^ 



(96) 



1 ^ 1 ^\ ■^ ^ Eii.j^j'j^j^{si,ti,S2,t2;ll,li,l2il2) 

i=l * Jl=j'2f^j[=J2h,l[ = ll2,l'2 = ^ 



9{si)g{ti)g{s2)g{t2) 



^ nhm'^ ' 
^ nhm^ ' 





i/ max{|si - t2|, 1^2 - ti|} < 4^ 
if \S\ - t2| < 4^ a?^C? |S2 - til > ^ 



A/i 



A?t 



i/ \s\ — t2| > ^ O'^t? |'S2 — tl| < 2 

otherwise. 



(97) 



Proof of Proposition 7.2 : Define, 

-^n«2;ii j2(*i' *i' ^2, ^2; ^1, ^2) 
-2 ^^2 



^Klil^*2i2^(si+tl)/2,/l(^nil)^(s2+t2)/2,/2(^*2J2)]'3h( 



Si +ti 



Sh)Qhi 



Si +tl 



Si,) (98) 



and 



Gi^i2;ji,j[,J2i^l,h,S2,t2;ll,l'l,l2) ■■- E[yijjjyjjj/yJ2J2-^Si,/i(rJijJi^J^_i'^(Tj^j/)K(g2+42)/2,/2(^*2i2)] 

• Qhisi - si,)Q,,{ti - si>^)Qf,{{s2 + t2)/2 - Si,). (99) 
First, \i ii = i2 = i then, with * denoting Fii-jj^j,{si,ti, S2,t2;li,l2), 

'^i -^n i-'n 1 ^^i -^?i -t^ri 



. , I < -J ""1 -^n -L-'n 1 ""i -tj?i j^n 

* ii7^i2 ^1=1 '2=1 * ii=i2h=i'2=i 



Next, if ii 7^ ^2 then, with * denoting Fi-^i,-jj^j,{si,ti,S2,t2;h,l2), 

mi-, rrii^ j j 



2 2 mi^vrii. 



Li-, ULTr, I I 

-J ^1 ^2 -^n -^n 

m . Z-^ Z—J Z—J Z—J 

ii=ii2=Hi=H2=i 



(100) 



(101) 
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Next, if ii = Z2 = z then, with ^ denoting Ga-j^j/ j^{si^ti^ 52, ^2; /i, li-,^)-, 

' hi^3[^i2h,l[=ll2 = l ' j,=j[^J2h,l[ = ll2=l 

m^ Z-^ Z_— / Z_— / rn^ ^-^ ^-^ ^-^ 



Tfl- Tfl- 



-1 ^^i -^n -^n 

3 E E E- w 



* ii=i;=i2ii,«'i=i«2=i 



Finally, if ii 7^ 22, then, with -k denoting Gi-^^^.j-^ji j^{si,ti, S2, ^2; h, I'l, h 

E[[/,,(si,ti)y.,(^^)] 



T^E E E E* +^^E E E E- 

if rrii^ "^-^ "^-^ ^^ ^^ mf mj, ^^ ^^ ^^ -^^ 



Arguments similar to those employed earlier show that the sums corresponding to ji = j[ in (102) 
and (103) do not contribute anything to E || HyCcipv |P- 

We first consider E(yjj (2:1)^42 (22)). Then Lemmas 7.11 and 7.12, stated below, give expressions 
for the leading term and the term Z7 (and corresponding bound), respectively, in (64). 

Lemma 7.11. If ii 7^ 12 or ii = 12, and ji 7^ J2 then for \sk — ifc| < ^, k = 1,2, with A > 
4{Bk + Cq), 

^ n ^ Trl-i Ln Ln jp / , 4- 1 1 \ 

1 V^ _j_ V^ sr^ sr^ ^n;ji,J2\^lj'^1^^2,t2',l'l,h) 
Z—^ m'^ Z—^ ^_— / Z—/ 



1 >r^ 1 ■^ ■^ Y^ Y^ -^iili2;jl,i2('Sl;^l)g2,^2;^l,^2) 

ii+ii *' ^'^ ii=ii2=i«i=U2=i yv lyyv i; 

-a2[E(C,(zi)) +E(a(z2))] + (7^ 



= C7(5i,ti)C(s2,i2)- ( AE— 1 (C(si,ti) + a2)(C(s2,t2)+^')+0(/i2) 

(1 1 '^ 1 1 " \ 

-(1--V— ) + — V pli, (C(S1, S2)C(tl, ts) + C{si,t2)Cis2, ti) + O{h)).il04) 
i=l * ii^i2 / 

Lemma 7.12. If ii = 12 = i, j'l = J2, then 

1 -A 1 Y^ -^y^ Fii.j,j,{si,ti,S2,t2;h,l2) ^ iO{j^) if\zi-Z2\<^ 
n2 ^ m^ ^ ;^ ^ g{zi)giz2) \ otherwise. 
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Finally, consider the term E([/j;^(si, ^1)1/^2(^2))- Lemma 7.13 gives an expression for the leading 
term in (65), Lemma 7.14 gives expressions and the corresponding bounds for the terms Zg and 

Zg. 

Lemma 7.13. Ifii / 12, Ji / Ji/ h = 12, h / il i' h, then for |si-ti| > ^, and \s2-t2\ < ^, 
with A>A{Bk+Cq), 

J_^ f ■.J_ ^ sr ^ Gii.j^^j,^^j^{siM.S2M;hAM) 

2^u;(m,) 2^ 1^ 1^ gisi)giti)giz2) 

1 ^ 1 ^ ^A ^ ^Gi^i^.j^j>j^{si,ti,S2,t2]li,l'iM) 2j.ni ^^ 

= (C(S1, ii) + 0(/l2))(C(s2, t2) + 0(/l')) 

- f A E — I (^(«i' *i) + 0{h'')){C{s2M) + <y^ + 0{h^)) 

/1 1 " 2 1 '^ A 

+ -(1--E— ) + - E /'n^^ iCisuS2)Citi,t2) + Cisi,t2)Cis2,ti) + Oih)). (106) 

Lemma 7.14. // ii =12 = 1 and j[ / ji = j2, ji / j'l = J2, then 

1 ■^ , , 1 ■^ v^ v^ Gii.j^j>j^{si,ti,s2,t2;h,li,h) 
;3E^("^^)Z3 E E E 



^2 Z^ ^ Z^ Z-^ Z-/ g(si)q(ti)g(z2 



O(^) ^/ki-S2| 



< 



Ah 



otherwise. 



(107) 



1 Y^ 1 ■^ ■^ Y^ Gjj.jj j/ ,,-2(si)il5 52)*2;^l)^l)^2) 

;^Z.M"^^);^ Z. Z. Z. 5(^1)9(^1)5(^2) 

^ |0(;i) ^f\tl-S2\<f 
I otherwise. 

Details of the calculation of pointwise variance (32) 
Proof of Proposition 7.3 : Consider first 

W~^ {s,t)Ya.T{C{s,t)) = W~^ {s,t)E{C{s,t)f - {E{W~^ {s,t)C{s,t))f. 

ILfi ILfi ILfi 



(108) 
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Using (59), (88) and the arguments leading to (91), we have 

'■1 '■2 ^n ^n 

n 



'^h„'^^'*''„2 Z^ Tr,2 ^2 Z^ Z^ Z^ 2-^ {g{s)g{t)y 



-(E(W^Js,t)C(s,t)))2 

w~ (s t)— V ^^^'^'^^^'^^ V V V V 



n^ 



E[C(ri^j, , Ti^j'jKs^i-^ {Ti^jj )Kt,i[ {Ti^j[ )] IE[C(rJ2J2 , Ti^j'^)Ks^i^ {Ti^J2)Kt,v^ {^i2j'2 )] 



g{s)g{t) 

E w{mi) ^ ^ 



mi L„ 



9{s)g{t) 
9is)git) 



1 '^ 

+^/i„(«' *);3 E ^^^2 ((c^(^' «) + oih^)){Cit, t) + o(/i2)) + ids, t) + o{h^)f) 



n^ 



^ , . 1 



«l^«2 



2\\2 



"'71 n 



n 



+^/i„(«' *);^ E /'n^2 ((C(^, s) + 0{h^)){C{t, t) + 0{h^)) + (C(s, t) + 0(/i2))2) 

n7^«2 

= 0(i) + (^EpL)0(l). (109) 

\ ii^i2 / 

Combining (109) with (91) and (92) - (97), we obtain (66). 
Proof of Proposition 7.4 : Write 

W~^^{s,t)Y^T{C.C-±^)) = W~^^{s,m{C.C-^)f - (E(t^^J.,t)a(^)))2, 

and observe that, by (63), (98) and (104), and following steps very similar to those leading to (109), 
we have 



mi-, mi„ T T 

^1 '2 ^n ^n 



wr {s,t)\ y -^— yyyy 

il^i2 ^ 2 il = lJ2 = Ul=n2=l 



^ili2;jl,J2\'^i '^' '''I '^i 'l' ^2 j 

WW 



S + t, 



{E{Wr(s,t)a{^-))) 



\ i\i=-i2 I 

= o(i)+(^i:^?.«)oW' ("») 

\ n7^«2 / 

Combining (110) with the steps leading to (104) and (105), we obtain (67). 
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Proofs of Lemmas 7.7 - 7.14 

Proof of Lemma 7.7 : Since pa = 1, from expressions (112) and (115) we can treat the terms 
corresponding to ii = i2 = i and ii ^ 12 in a unified way. From (120) and (121), the expression 
(122) and the calculations leading to (50), (91) follows. 

Proof of Lemma 7.8 : It follows from (116), (123) and (128) (taking s = si, s' = S2, t = ti and 
t' = t2 in the latter). 

Proof of Lemma 7.9 : Follows by arguments analogous to those for deriving (92). 

Proof of Lemma 7.10 : Follows from (118), (123) and (126). 

Proof of Lemma 7.11 : By (114) and (118), 

= iCm,,„T,,,,) + a^)iCm,,,,T,,,,) + a^) +2pl,^{Cm,,„T,,,,)f. (HI) 

The expression for E[(C(ri,j,,ri2J2))^-fi^zi,/i(rJiji)-fS^Z2,«2(^J2i2)] is given by 

{C{u,v)fg{u)g{v)K^^^i-^{u)K^^^i^iv)dudv, 

and it can be shown that when we sum over li,l2 = 1, ■ ■ ■ , Ln, the sum equals (C(zi, Z2))'^g{zi)g{z2)+ 
0{h?). From this, and the calculations leading to (51), we have, for \sk — tk\ < -n-, A: = 1,2, with 
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A>A{Bk + Cq), 



1 V^ J_ Sr^ \~^ Sr^ -^M;jij2(sii ^1; S2-, ^2; h-,h) 

I J_ v^ 1 v^ v^ v^ v^ -F'2ii2;ji,i2('5i?^i7'52,^2; ^1,^2) 

ii~i Z -^ -^ ry-to . ivy) . -^ -^ ^ -^ ^ -^ ^ -^ 



n2 /^ mi,mi2 f^ :^ ^ ^ 5(^1)^(22) 

-a2[E(C,(zi))+E(C,(z2))]+^^ 

= ( -(1 - - E — ) + ^^^ ) (C(^i, ^1) + ^' + 0(/i2))(C(z2, ^2) + CT^ + 0(/i2)) 
\ n n ■^^ rrii n I 

/ 1 " 1 1 " \ 

+ -(1--E— ) + - E /'n^^ (2(C(zi,Z2))2 + 0(/.2)) 
I n n ^-^ rrii n^ ^-^ I 

\ *=i n^«2 / 

-a\C{zuZi) + C;(Z2, Z2) + 2^2 + 0(/i2)) + a^ 

(1 1 " 1 1 " \ 

-(1--E— ) + -E /'n*2 (C(^l, ^2)C(tl, t2) + C(S1, t2)C(s2, tl) + 0(/l)), 
«=l n^«2 / 



-a2(C(si, ii) + C(S2, i2)) - ^^ + 0(/i') 

C(S1, tl)C7(s2, t2) - ( ^ E — ) (C(S1, ti) + Ct2)(C7(s2, t2) + a') + ©(/i^ 



71^ '^ — ' rrii 



1 11 1 

+ 1 -(1--E— ) + - E/'n*^ I (C(5l,^2)C(ti,t2) + C(si,t2)C(s2,tl)+0(/i)). 

I n n '^^ rrii f^ '^ ' 

j = l 417^42 

Proof of Lemma 7.12 : Note that, E(y4|Ti) = 3(C(rij,,rijJ + a'^f. Thus, from (129), we 
have 



-1^ if U, _ ■yj <r 



0{h-^) if|zi-Z2|< — 



2311' 



I otherwise 

uniformly in Si,ii, 52,^2 G [0, 1] and 1 < ji < rai^ 1 < i < n. Therefore (105) follows. 
Proof of Lemma 7.13 : By (113) and (116), 

The expression for E[C(ri,j^,ri2jJC(Tijj/ ,ri2j2)^,^,/^(rijjj^i^ ;, (ri^j/)^^2,;2(ri2j2)] is given by 

C{u,w)C{v,w)g{u)g{v)g{w)ks^i{u)Kt^v{v)kz,rn{w)dudvdw, 
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and it can be shown that when we sum this over li,l[,l2 = 1, ■ ■ ■ ,Ln, the sum equals 

C{suZ2)C{h,Z2)g{si)g{ti)g{z2) + 0{h^). 

From this, and similar arguments as before, we have, for |si — ti| > -^, and |s2 — ^2] ^ ^, with 
A > 4{Bk + Cq), 

1 ■^ , - 1 ■^ >A >-^ Gii.j-^j'_^j2{si,ti,S2,t2;h,l[,l2) 
-2 2^^{m)^ }_^ l^^ 9{si)9{h)9{z2) 

*-i ji¥'ri¥'nh,ii=ih=i 

1 Y^ 1 Y^ Y^ ^ ^ <-^ni2;ji,iJj2V'5l)tl) ■52,12, tl,'i,'2j 



-a'^^C{si,ti) 
1 1 " 9 

-(1 - - V — )(C(S1, ti) + 0(/l2))(C(z2, Z2) + ^2 + 0(/i2)) 

n n '^^ m,- 

+ „2 2^ ^i"^nJ 2 2^ 2^ g{si)g{h) 

i2 = li2 = l 



"^i2 fr'i ,^ 5(^2) 



— cr — > wimi) — ^ > > — - — -- — 7 ^— 

n^ ^ ' m^ ^ ^ g(si)g(ti) 

/ 1 "^ 2 1 " \ 

I n n ^-^ rrii n^ ^-^ I 

\ «=1 «l^«2 / 

(1 1 " 9 — 1 \ 
-(1 - - E — ) + (C(si, ti) + 0(/i'))(C(z2, ^2) + a^ + 0{h^)) 
n n ^ rrii n I 

-a\C{si,h) + 0{h^)) 

/1 1 '^ 2 1 " \ 

\ «=1 «l^«2 / 

= (C7(si, ti) + 0{h^)){C{s2,t2) + 0{h^)) 

- ( A E — ) (C(^i, *i) + 0{h^)){Cis2,t2) + a^ + 0{h^)) 

(\ 1 " 2 1 " \ 

+ -(1--E— ) + -E/'?i*2 (^(«l'«2)C(ti,t2)+C(.l,t2)C(s2,tl) + 0(/i)). 

\ «=l n7^«2 / 

The last equality follows from the fact that the terms C{si,ti) + 0{h?)) appearing lines four, nine 
and ten are the same. 

Proof of Lemma 7.14 : Follows from (117) and (127). 
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Proof of (72) 



Define Wij = ■4Jk{Tij)Bi{s,Tij) and Wij = 'ipk'{Tij)Bi{t,Tij). Since |s — t| > Ah/2, it follows that 
for all i, WJ'jWlj = 0, for all k,l>l, for all j = 1, . . . , m^. Thus, if |s - i| > Ah/2, then 

nii 



4-1 



nii 

+ Yl [^(^^ii ^ij'i ^iJi ^^n ) - nw^n Wif^ MWif^ Wij, )] 
ii=j2^i;=i2 

'rrii 

+ Y [nw^nW,,,Wi,,w,,,) - nw^nWi,,;)nwi,,Wij,)] 

Jl=j'2¥=j{¥=J2 
rrii 

+ Y [E(W^i,i%W^i,;%) - E(t^,,,%)E(T^,,j%)] 

rrii 

+ Y i^iWi,,w^^,Win) - nwij,w,^onw^,2W^JX 

j'l=j'2¥=Jl¥=J2 

since the term corresponding to ji ^ j'l ^ J2 ^ J2 vanishes. 

Now by using the fact that Tjj's are i.i.d., we can simplify each sum on the RHS. 

1st term = m,{mi - l)[KiWl^)E{W^-,) - {K{W^l)f(E{Wil)f] 

2nd term = m^irm - 1)[0 - {E{W^i)f (EiWa)?] 

3rd term = m^imi - l){m, - 2)[EiW^-,)iEiWii)f - {E{Wii)f{E{Wa)f] 

4th term = m.K - l)(m, - 2)[0 - iE{Wii))\E{W^i)f] 

5th term = m^{mi - l){m, - 2)[0 - {E{Wii))\E{Wa)f] 

6th term = m^imi - l)im^ - 2)[(E{Wii))^EiWi^) - iEiWii))\EiW^i)f] 

Thus, 

mfYar{BuMs)Bii,k'{t)) 
= rriiim, - 1)[E{W,\)E{W^^) + {nn - 2)E{Wl)iE{Wii)f + K - 2)(E(t^,i))'E(W,2)] 
- mi{m^ - l)(4m, - 6){E{Wa))HHWii)f 

Now, using the facts that E{W,\) = 0{h^^) = E{Wl) and \E{Wii)\ = 0(1) = |E(Wii)|, we conclude 
(72). 

Computation of conditional mixed moments 

The computation of the moments is done by using the Wick formula (Lemma 7.4). We consider 
all the different generic cases below: 
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• Case : ii / i2, ji / j'l, J2 / J2 = In this case, 

^v^n ii ^hj'i ^i2J2 ^i2J2 1 -'- n ' -'- *2 J 

+Phi2 C{Ti^j^,Ti2J2)C{Ti^j>^,Ti2J>^) + C{Ti^j^,Ti^j0C{Ti^j>^,Ti^j^) . (112) 

• Case : h / ^2, ji = ii, j2 / J2 (equivalent to h / 12, ji / j'l, j2 = J2): In this case, 

Therefore, by (42), 

'^\^iljl^i2J2-^i2J'2\ -'-H' -'-«2) ~ ^V-'«lil' ^ili\)^\^i2J2^ -'J2jV~'~^«l«2^^-'«lil' "' «2i2 )^ (-' iljl ' '''i2J0- 

Combining, we have 

■"^(-''^iljl -^iljj -^«2i2 -^J2i2 I *1 ' *2 J 

• Case : ii / ^2, ji = il, i2 = J2 = In this case, 

^l^nii ^jij( ^i2i2 ^i2i2 1 *i ' *2 ) 
= (C(r,,,„T,,,J + a2)(C(r,2,2,T,2,2) + a') + 2p2^,2(^(^^m'^^2j-2))'- (114) 

• Case : ^1 = ^2, ji / j'l / i2 / i2 = In this case 

^{J^iljl ^iij[ ^12 J2 ^12 J2 I «1 ' «2 ) 

= C{Ti^j^,T.i^ji_^)C{Ti^J2^Ti-^ji^) + C [Ti^j^,Ti^J2)C [Ti^ji^,T^^ji^) 

+ C'(^njii^ni^)C(^Jii('^JiJ2)- (115) 

• Case : ii = ^2, ji = j'l / i2 / J2 (equivalent to h = 12, Ji = J2 / il / ^2; ^1 = ^2, 

ji = J2 / ii / i2; ^1 = ^2, ji / j1 = i2 / i2; ^i = ^2, h / ii = i2 / ^2; and ii = 12, 

Ji / ii / h = j'2)' In this case 

^(^njl Mij( ^J2i2 ^i2J2 I *1 ' *2 ) 

= (C(^nii>^nji) + <^'^)C{Ti^J2,Ti^j'^) + 2C{Ti^j^,Ti^j^)C{Ti^j^,Ti^j0. (116) 

• Case : ii = ^2, ji = j[ = J2 / J2 (equivalent to ii = 12, h = j[ = J2 / J2; H = h, 

31 = 32 = 32 / ii; and 11=12, ji / Ji = J2 =32)' In this case 

^^(^nii^nj(^i2i2^J2i2l'^*i''^*2) = '^C![Tij^jj^,Tij^jjC(Ti-^j-^,Ti^ji^) + 3a C(Ti^j^,Ti^ji^). (117) 

• Case : h = i2, ji = j'l / 32 = 32 (equivalent to h =12, ji = 32 / Ji = 32'i and ii =12, 
31 = 32 / ii = i2): This this case 

'^\^iljl^ilj{^i2J2^i2J2\ *i' *2J 

= (C(^iiiii^nii) + o-^)(C(^iu2>^ni2) + '^'^) + '^{C{Ti^j^,Ti^j^)f . (118) 

• Case : ii = 12, ji = ii = i2 = i2 • In this case 

E{Y,,,,Y,^^,Y,,,,Y,^^>\T,„T,,) = ?,{C{T,,,,,T,,,,) + a^f . (119) 
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Computation of unconditional mixed moments (off-diagonal part) 

Here, we obtain simplified forms the certain expectations that are used in the proof of Propositions 
7.3 and 7.4. Observe that, based on our calculations in Appendix A, we only need to compute the 
expectations of the form 

^C{Ti^j-^,Ti,_^ji_^)C{Ti^j^,Ti,^ji^)Ks^^i^{Ti^j^)Ki^^ii^^^^ (120) 

Notice that, when the pairs {Ti^j^^T^i ji ) and {Ti^j^^T^i ji ) are independent, the expectation in (120) 
factorizes as 

nC{Ti,,,,T,,^^,^)Ks,,l,{T,,,,)k^^^vST^',^^^^^ (121) 

Each individual term is exactly of the same form that we encountered while calculating the bias of 
our estimate. The expectations appearing above are of the form 



// 



C{u,v)g{u)g{v)Ks,i{u)Ks',v{v)du. (122) 



For other terms we need to evaluate or approximate various other integrals. The general forms of 
these integrals are given below, for 1 < l,l',m,m' < Ln and s,s',t,t' S [0, 1]. 

{C{u,u)Y g{u)Ks,i{u)Ks> ^i/{u)du 

otherwise 

C{u,u)g{u)Ks,i{u)Ks',i'{u)Kt^miu)Kt'^m'{u)du; (124) 

{C{u,u)fg{u)Ks,i{u)Ks',i'{u)Kt^m.{u)Kt'^rn'{u)du. (125) 



{C{u,v)Yg{u)g{v)Ks,i{u)Ks>^i'{u)Kt^m{v)Kt'^rn'{v)dudv 

0{h~'^) if max{|s — si\, \s' — s^/j, |t — Sm|, \t' — Sm'\} < 2BKh 
otherwise. 

for r = 0,1, 2 (126) 

{C{u,u)Y C{u,v)g{u)g{v)Ks,i{u)Ks' ,i'{u)Kt^m{u)Kti ^m'{v)dudv 

0{h^'^) if max{\s - si\,\s' - si>\,\t-m\} <2BKh 

; tor r = U, 1. (127) 

otherwise 




C{u,v)C{u,w)g{u)g{v)g{w)Ks^i{u)Ksi^i'{u)Kt^rn{v)Kt>^rn'{w)dudvdw 

0{h-^) if max{|s - si\,\s' - sv\] < 2BKh 
otherwise. 



(128) 
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Computation of unconditional mixed moments (diagonal and mixed part) 

We have the following bound: 

\0(h-^) a \zk - siA < 2BKh, k = 1,2 ^ , , 

= < ^ ' ' '^' - ' ' for r = 0,1, 2. (129) 

I otherwise 

Some error bounds involving Dirac-(5 

Here, we provide some key estimates that are crucial to obtaining the overall risk bound. They all 
involve the operator Hy. Due to the decomposition (55) we can reduce the computations of these 
bounds to integrals involving {V'fc(')}fc£i ^^d (5(-,-). Throughout we assume that R{^s\,S2,t\-,t2) is 
a "nice" function satisfying certain (boundedness) conditions. Then, 




(5(x, Sl)(5(x, S2)-R(si, S2, ti,t2)lpu{tl)lpu{i2)dsidS2dtldt2\ 

R{x,x,ti,t2)Mti)Mt2)dtidt2\ < II R llooll ^Pu IlL • (130) 

6{x, Si)6{x, S2)WAh{si, ti)R{si,S2, ti,t2)tpu{ti)ipu{t2)dsids2dtidt2\ 
5{x,Si)5{x,S2) / R{si,S2,ti,t2)llJu{ti)'4)y{t2)dtidt2dsidS2\ 



(si-4^)V0 



(x+^)M 



R{x,x,ti,t2)Mh)Mt2)dtidt2\ < ^/i II i? llooll V'i. IISo ■ (131) 




5{x, Si)5{x, S2)WAh{si,ti)WAh{s2,t2)R{si,S2,ti,t2)tlJu{tl)ll^u{t2)dsidS2dtidt2\ 
6{x,Si)6{x,S2) / / R{si,S2,ti,t2)'ll}u{tl)'^u{t2)dtidt2dsidS2\ 

R{x, X, ti,t2)tpu{tl)ll^uit2)dtidt2\ 



(x+^)Al /■{x+4^)Al 



^2 II D II II „;. Il2 



< (yl/i)^ Ili^llooH. 11^. (132) 

6{x, Si)6{x, S2)WAh{tl,t2)R{si, S2, ti,t2)'lljy{ti)lljy{t2)dsids2dtidt2\ 

r{t2 + ^)M 

6{x,Si)5{x,S2) / R{si,S2,ti,t2)lpv{ti)'lpy{t2)dtidt2dsids2\ 

J{t2~A^)yo 

-(i2 + 4^)Al 

R{x,X,ti,t2)lpu{h)'^u{t2)dtidt2\ 

< ^/i II i? llooll V'^IlL- (133) 

51 




(t2-#)V0 



< 




5{x, Si)6{x, S2)WAh{tl,t2)WAh{s2,t2)R(.Sl, S2,ti,t2)llJu{tl)-4^y{t2)dSidS2dtidt2\ 

5{x,Si)5{x,S2) I I R{si,S2,ti,t2)lpuitl)'llJuit2)dtidt2dsidS2\ 

I2-^)V0 J{t2-^)V0 

R{x, X, ti,t2)lpu{tl)lpuit2)dtidt2\ 



(x+4t)Al p{t2 + ^)Al 



{x~^)yo J(t2-^)vo 



{Ahf II R 



V', 



ooll TU lloo 



(134) 



6{x, Si)6{x, S2)WAh{tl, S2)WAhis2,t2)Risi, S2,ti,t2)lpu{ti)llJi,{t2)dsidS2dtidt2 



5{x,si)6{x,S2) 



(S2 + 4^)A1 /■(S2 + #)A1 



(^2 



^)V0 J(S2 



-)vo 



R{si, S2,ti,t2)lpuitl)'lpuit2)dtidt2dsidS2\ 



{x+4^)Al /.(x+4t)Al 



(x-^)VO J{x-^)VO 



R{x, X, ti,t2)lpu{ti)llJuit2)dtidt2 



< {Ahf II R 



V', 



oo II H^i/ lloo 



(135) 



Appendix G : Proof of Theorem 4.3 

In order to prove this result, we use a strategy very similar to the one used in the proof of Corollary 
1 in Paul and Peng (2007). In view of the statement of the theorem, it suffices to consider a 
submodel consisting of kernels S of rank 1. Let 

S(°)(s,i) = AV^(s)V^(i), s,tG[0,l] 

for A > Ci, where Tp{-) = 1. Then tp is the first (and only) eigenfunction of S with corresponding 
eigenvalue A. Let us suppose that the design D satisfies m = m = ?ti > 4. Finally, choose g to be 
the uniform density on [0,1]. Let M^ ~ (nm)^'^, and let {7;};^*^ be orthonormal functions such 

that (i) 7;'s are twice continuously differentiable, and max; || 7j ||oo= 0{M^ ■'), for j = 0, 1,2; 
(ii) Jq ^i{s)ds = for all I, and (iii) 7; is centered around l/M^ with length of support 0{M~^) 
uniformly over /. Note that, condition (iii) implies that {7;} are orthogonal to t/j. Let Mq = [— g^]. 
Let Tq be an index set satisfying log |J^ol ^ -^*; ^-^id {z^ : / = 1, . . . , M^,}jgjPo be a collection with 



Ji) 



-1/2 



\ / '> l/y /■■^ 

taking values in {—Mq , 0, Mq }, such that with z"^ denoting the vector (, 



^ ||2= 1 and 



_ we have 
2;U) _ z^^ > ||2> 1 for j 7^ / G J^Q. The construction is by a "sphere packing" 



■I )l=l^ 



argument as in Paul and Johnstone (2007). Let 6 x (iim) ^'^ x M^ ^ Then, define 

1=1 

Note that by construction, (i') || -0"' ||2= 1; (ii') ■0 ^^^ twice differentiable, with second derivative 
bounded; (iii') || V' —"0 ||2> ^ for j / j' G JTq; (iv') || -0 — "0 lloo= 0((5) uniformly over j G JTq. 
Property (iv') will be crucial for much of our analysis later on. 
In order to prove Theorem 4.3, we need to show the following: 



^Ei^(s|^\sr) X nm(5^ uniformly in j e Tq, 



(136) 



i=l 
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where Sj denotes the covariance of the observation i given {Tu}^^ under the model parameterized 
by Sq ) ^^d E denotes expectation with respect to the design points T. 

Proof of (136) 

From now onwards, we shall fix j £ J-q, and drop the superscript (j) for convenience. Denote the 
m X 1 vectors {'4j{Tij)'^^^ and {^(Tij)^^-^ by -j/jj and ■j/»j, respectively. Of course, •0j is the nonrandom 
vector with all the entries equal to 1. Next, observe that, 

l|sf-S. IIf = 'X\\^i^,,-^,f + i^,_-iP,)^J\\F 

< A(||^, II2 + IIV. I|2)||V'..-^J|2. (137) 

Since || ipi — tpi ||2< \/rn \\ ^p — ip ||oo= 0{^yrn6) (by property (iv')), and || tpj_ {{2= \/rn, from (137) 
it follows that, 

max II sf^ -% |||= 0{m^5'^). (138) 

l<i<n 

Since m5 x m{nm)^^'^ and m = o(n^'^), the RHS of (138) is o(l) (a nonrandom bound) uniformly 
over J^Q, and hence, using arguments as in the proof of Proposition 2 in Paul and Peng (2007), we 
have 



5;K(E„sf) X 5:i|(sfVV2(sf) _s,)(sfVV2|||^ uniformly over ^o- 



Thus, (136) will follow once we prove: 
Proposition 7.5. Uniformly over Tq, 

E||(sfV'/'(4°^-^i)(4°V'/'llF >^ rn6\ (139) 

Proof of Proposition 7.5 : First, note that, = ijji/ ^/m is a vector of I2 norm 1, and hence, by 
using a standard matrix inversion formula, 

(Sf )-^ = (/ + Xmefy^ = 1- KOf, where k- ^"^ 



1 + A?n 

Let A = Si - T>^' = A(^i^f - m(96'" ). Then, 

II 1^1 j 1^1 - ^IJl^l j IIf 
tr [(/ - K'ef)^{I - KOf)^] 

tr [(/ - 'ef)^{i - ee^)A] + 2(1 - k)b'^a{i - 'ef)Ke + (1 - i^f0^Mf 

t \\\ {I - M^)^i \\l +2(1 - ^)(^^^i)2 II (/ - -ef)iP, \\l +(1 - Kf{m - {t^i^^f) 

I V'l \\l -{f^iff + 2(1 - K)irii,,)\\\ V^i Hi -(^"^^1)2) + (1 - K)\m - {¥ ^1)^) 

(140) 

where the third and last steps follow from the fact that {1 — 09 )0 = Q and {1 — 99 )^ = I — 99 . 
From (140), the proof will follow once we establish the following results. 
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a' 



Lemma 7.15. With Tij i.i.d. from f7ni/orm[0, 1], we have (uniformly over Tq) 

IE["i II ^1 II2 -(V^r^i)^] = m(m- 1)5^(1 + 0(1)), 
and Var[m II ■0^ II2 -(V^ii/'i)^] = 0(m^5^). 

Lemma 7.16. With Tij i.i.d. from C/ni/orm[0, 1], we have (uniformly over Tq) 

E[m-^!f^i]^ = m5^{l + o{l)), 
E II ^1 - V'l II2 = 0{m.'^5^). 

Lemma 7.17. With Tij i.i.d. from Uniform[0,l], we have (uniformly over Tq) 

E[|| ^1 \\l (m II Vi \\l -GP^^if)] = ^n\m - l)6\l + o(l)). 
To see how (139) follows from Lemmas 7.15 - 7.17, note first that, 

E{^^tP,f{m II ^1 ||2 -(^f^i)2) 

= mE[|| iP, \\l (m II ip. Hi -(^[Vi)')] -E(?n || V'l ||i -(^[^1)')' 

= m3(?n-l)(52(i + o(l))-0(m^5^) =m3(m-l)j2(i + o(l)^ (141) 

by Lemmas 7.15 and 7.17. Now, from (140) we obtain 

E II (Sf V^/^(sf ^ - S,)(4°V^/^ III > 2f{l - K)i.E(^r^,)2(m II V'l Hi -(^rv>i)^) 



2A m{m — 1) 2 



1 + Xm 



5^(1 +o(l)). 



where the last step is by (141). This establishes the lower bound in (139). 

To establish the upper bound in (139), we also need to consider the expectations of the other 
two terms on the RHS of (140). First, by Lemma 7.15, 

a'e(|| V'l Hi -{O^i'if? = — E(m II ^1 Hi -(^rtAi)2)2 = 0{m^6''). (142) 



Next, writing 

m - {F tP^f = m- II ^1 Hi + — [m \\ xjj^ \\i -(^\ ^1)^], 



~'^ ' \2 _ II „/. ii2 , 1 r™ II „/. ii2 fZrJ' ^1. ^2l 



m 

and then using the fact that for any e > 0, and a, & G M, (a + 6)^ < (1 + e)a? + (1 + e~^)6^, we have, 
for arbitrary but fixed e > 0, 

E(m-(^^Vi)^)^ 

< (1 + e)E[m- II ^, ||i]2 + ii±i:^E[m || V'l ||i -{"fi^if? 

= {I + e)^[\\^^- ijj^Wl -2{m -^ltl)^)f + 0{m^6'^) (by Lemma 7.1) 

< 4(1 + e)2E[m - ^[^i]^ + (1 + e)(l + e-i)E || ^^ - ^^ ||^ +0{m^5^) 

= A{l + efm5'^{l + o{l)) + 0{m'^5^), (143) 
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where the last step follows from Lemma 7.16. Finally, substituting (142), (141) and (143) in (140) 
we obtain an upper bound of the form 



,-r2 



.-r'i 



'\"^^ '^ ^^(1 + «(!)) + (1 + ^)'77^^^'(1 + o{l)) + 0{m'6') = 0{m5% 
1 + Am (1 + Amy 

which concludes the proof. 

Proof of Lemmas 7.15 - 7.17 

In order to prove the lemmas, we define ^ = tp — tp and notice the very important set of relations : 

e= /(?-V')= f{4^-ij)i^ = i- fi^^ = \ f\4^-ijf = \ fe = \s^ + 0{6'). (144) 



Proof of Lemma 7.15 : The decomposition 

rT 



m 



fc=i fc^fc' 



yields 



E[m II ^P, \\l -i^^iPif] 



= [m — l)m ip — m{m — 1)( ip) = m{rn — 1)[1 — (1 — / ] 

= m{m-l)[2 h-{hf]=m{m -1)5^(1 + 0{6^)), (by (144)). 

Define t = J ip. Then, using (145), 

Var[m || ip^ \\l -{ip^^^)'^] 

m 

= {m-lfE[Y^{^p\T,,)-l)f 



k=l 



ki^k[ k2^k'^ 
m 

-2(m - 1) E E IE[(^'(TifcJ - l){^{T^k,mT^kO - r')] 

ki=l k-i^k'^ 

{m - l)2mE(V'^(Tn) - 1)^ + 2m(m - l)E{iP{Tii)i;{Ti2) - r^f 
+4m(m - l){m - 2)E[(V^(rn)V(Ti2) - T^){i^{Tii)i,{Ti^) - r^)] 
-4m(m - l)2E[(V''(rn) - l)(V(Tn)V(ri2) - r^)] 



m{m — 1) 

+A{m-2){[ i;^ j iP j ^p - t^) 
m{m — 1) 



4m(m-l)2( / V^ ip-T^) 



(m - 1)(|(1 -0^-1) + 2(1 -r^) 
+4(m - 2)r2(l - r^) - 4(?n - l)r( /(l - ^f - r) 



(145) 
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Simplifying this expression, and using (144), first term within square bracket is (m — 1)(4 J^^ — 
4 J^^ + J C^), and the last term within square bracket is — 4(r?T, — l)r(2 J^^ — /C^)- Collecting 
terms and using the fact that 1 — r^ = 2(1 — r) — (1 — r)^ = ^^ — (^^)^ (again by (144)), we can 
express the sum as 

m{m - 1) (4(m - 1) + 4 + 4(m - 2) - 8(m - 1)) / ^^ 

-(4(m - 1) - 4(?n - 1)) f f + (m - 1) / ^^ 
+m(m - 1) [-4(1 - r)2 - 2(1 - T^f - 4(m - 2)((1 - rf - (1 - T^f) 

+A{m-l){l-T){2J e- j f) 
= 0{m^5^). 

Proof of Lemma 7.16 : First observe that, 
E[m- V^fVi]^ 

mm m 

= E[^(l - ^{Tik))? = ^E(l - i^{Tik)? + Y. ^[(1 - V'(rifc)(l - V(TifcO] 

fc=l A;=l fc^fc' 



m / (V' — "0) + m(?TT, — 1){ I {tj: — ip)y 

{5^ + 0{5^)f = m5'{l + o(l)), (by (144)). 



m(52+0(5^)) + ^^^^^^^^^-2 , ^,x4^^2_^.2 



Next, 



E II V'l - V'l II! 



E[X;(i-0(ri,))2]^ 



fe=l 



= 5^E(i - 0(rifc))4 + J] E[(i - V'(rifc)2(i - v^(rifcO'] 

fc=l k^k' 

= m I (^ - i))'^ + m{m - l){ I (^ - il:ff 

< „ II *-*iiL/(?-*f + »(.»- i)(/»-*W 

= 0{m5^) + m{m - l)5*{l + o(l)) = ©(m^^^), 
where in the last step we used (iv') and (144). 
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Proof of Lemma 7.17 : Use (145) to write the expectation as 



(m-l)E[^V'mfc)]'-E 



fe=i 






fel=l 



{m — 1) 



j2^i^HTik) + J2 ni^\Tik)^^\Tik' 



k\=k2^k2 ki=k'2^k2 



+ Y. mHnk,)ij{Tik2mT,,0] 

kij^k2y^k'2 

= (m-l)[m / V'^+m(?n- 1)( / ^2)2j 

- [2?n(m - 1)( / V^^)( / ^) + m(?n - l)(m - 2)( / ^2)( / V)^] 
= m(m - 1)[|(1 - 0' + (m - 1) - 2(|(1 - 0')(/(l - 6) - ("^ - 2)(|(1 - Of] 

= m{m-i)[m j e -{j e){2- j e)-\{m-2){j e? + j e] 

= m^{m-l)5'^{l + o{l)), 
where in the fourth and last steps we used (144) and (iv'). 
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